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ABSTRACT 



Conventional Parsing techniques use grammars as emoeaded 
procedural knowledge bases in mechanisms which are caoable 
of translating words in the language defined into equivalent 
parse trees. The approach described in this paper uses 
context-free grammars as data allowing access to synthesis 
templates which enable the user to create and interact with 
parse trees directly. The advantages of this approach are 
the utility of human-oriented grammars, the dynamic inter- 
changeability of language definitions, immediate error re- 
jection, and the ability to handle partially complete parse 
trees. The design for a prototype programming environment 
using grammar-driven synthesis is presented. 
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INTRODUCTION 



There is a great deal of interest in the improvement of 
program and system devel oomen t efficiency* primarily because 
software costs have risen dramatically in recent years as a 
fraction of total system development costs. One approacn to 
the improvement of efficiency is the provision of an 
enhanced set of interactive program development tools for 
the programmer and the increased automation of program 
development. Many such efforts involve the notion of a 
"programming environment"* that is* an interactive 
environment in which a wide selection of software tools is 
provided as an integrated package* with a consistent and 
relatively concise command structure. Typically* a means is 
provided to allow the programmer to work within the language 
being used for the program* without having to descend to the 
object language level to perform any of the functions 
necessary to create* modify* or test the program. 

As a concrete example* the reader's attention is drawn 
to the most w i oe l y -known integrated programming environment* 
the APL system liverson* 19621. When using this system* the 
programmer is able to perform all steps in the program 
development process without ever having to issue explicit 
commands to the host operating system. The APL environment 
itself provides an integrated set of facilities for storing* 
editing* and deougging modules which are arranged in 
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workspaces and libraries/ access to which is available using 
commands that are part of the APL language definition 
itself. In addition/ so far as the user is concerned/ there 
is no notion of translating/ linking/ or loading individual 
functions or programs. To the programmer the system appears 
to be capable of evaluating programs written in APL without 
translation/ and all of the programmer's interactions with 
the APL programs defined occur within the syntactic 
framework of the original source language. 

Other language-oriented programming environments are 
under development or in use/ notably the ECL project at 
Harvard [Wegbreit et . al./ 1974]/ which is based on a LISP- 
like programming language/ and the GANDALF project/ 
tHabermann/ 1 979] / which is based on the new Department of 
Defense language/ ADA. Both of these projects are designed 
to offer an environment whicn is even more intensively 
syn t ax -o r i en t ed than that offered by APL. In addition/ 
these systems incorporate into an integrated environment a 
wide ranqe of facilities normally provided by the host 
operating system. The two human engineering ideas 
motivating the design of such systems are to free the 
programmer from the necessity of learning two command 
structures/ and the ability to reference and access parts of 
the modules being developed using the natural structure 
imposed by the syntax of the language in which they are 
w r i 1 1 en . 
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One of the crucial problems wnich must be solved in 
implementing such an environment is the need to provide more 
or less continual access to the evaluable program structure 
in a syntax-oriented fashion. Conceptua 1 1 y , the system must 
“understand" the syntactical structure of the program during 
its entire existence, not simply during the phase in which 
it is entered into the system. Thus, the internal structure 
of the program must be sufficiently complex to reflect the 
syntax of the program at all times, and facilities to 
utilize this structure must be on-line during the entire 
period of program development. Since such a requirement 
must be met for other reasons, a s y n t a x -d i rec t ed editor is 
often offered as the primary means of program entry. Such 
an editor utilizes the on-line Knowledge of program 
structure to allow additions, deletions, and modifications 
of the program structure to be made based on the natural 
syntactical units of the program, rather than the more usual 
line-oriented approach. 

Our research was originally motivated by this 
application for syntax-directed editing, since the program 
access algorithms for the editor are the very routines 
involved in program structure access througnout its life in 
the programming environment. fle wished to investigate the 
task of generating a syntax-directed editor from a grammar 
description, in the hopes that procedures for routinely 
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performing such a taste could De described in general terms, 
if not altogether automated. Ihe belief that a set of 
usable rules could De found was encouraged oy tne fact that 
techniques for generating a functionally analogous system, a 
parser, from a 8NF grammar description are we 1 1 -understood 
and, in fact, frequently automated. 

Ihe techniques reported in this paper are fundamentally 
very simple, but lie in a direction diametrically opposed to 
those involved in parser generation. A parser is a 
mechanism for taking a correct word in some language, and 
recreating the syntactical structure inherent in that word 
from the grammar of the language. Tnat this structure can 
be deduced from what would otherwise be a meaningless string 
of symbols is a consequence of the fact that the programmer 
used a grammar to create it that was equivalent to that used 
Dy the creator of the parser. The program itself represents 
a sequent i a 1 i zed version of parallel, hierarchical 
structures, one in the mind of the programmer, and the other 
internal to the computer system. The programmer has encoded 
the structure into the message, and the parser is the 
mechanism needed to decode it. 

Viewed in this light, the use of a parser-based 
translation system is a very odd solution indeed to tne 
problem of entering a program structure into a computer 
system for subsequent execution: it is as if a piano were 
were to be moved it into a house by tearing it into small 
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pieces* appropriately labelling each one* pushing the pieces 
through a mail slot* and relying on an automaton inside the 
house to reassemble the piano. This procedure is 
notoriously error-orone* and once accomplished* it is 
extremely difficult for the programmer to gain access in a 
human-oriented way to the actual structure built. Extending 
the simile used above* it is as if we could only confirm 
that the piano had been recons t rue t ed properly by listening 
to the music emanating from the interior of the house after 
the piano had been reassembled! 

Of course* the historical cause for such a solution is 
clear: most genera l -purpose computing systems* at the time 
language translation technology was elaborated* relied 
heavily on sequential* batch-oriented input mechanisms such 
as card readers* and were like houses without front doors* 
only mail slots. There was a driving need to invent such 
mechanisms as parsers so that high-level programming could 
oe done at all. 

However* with the increased reliance on interactive* 
remote-entry time-snaring facilities* a radically different 
solution to the problem of program entry can be 
investigated. The program structure can be i n t er ac t i ve 1 y 
built within the computer in the first place. Such a 
solution obviates the need for a parser altogether. 
Instead* the editor and the programmer cooperate to build 
the desired structure directly. The grammatical 



soec i f i c at i ons of the language are not used indirectly* to 
ouild a decoder for an unnecessary representation* Out are 
used simply as data to guide an appropriate* direct 
synthesis of a we 1 1 -s t rue t u r ed program representation. 

This thesis describes such mechanisms in enough detail 
to serve as the basis for the implementation of a language 
independent program entry system. The system is language 
independent in the sense that data corresponding very 
closely to the grammar of a context-free language itself* in 
the form of a finite set of static "transformations"* is 
directly interpreted by the system to form structures well- 
formed under that grammar. If the grammar data is changed* 
the same system supports a new language. 

me have adopted the term "gramma r-dr i ven synthesis" to 
describe the function of the systems discussed in this 
paper* in order to suggest the idea that grammars with a 
rich set of operators are utilized as knowledge bases with 
little or no pre-process i ng. This direct utilization of a 
human-oriented grammar is to be contrasted* for instance* 
with the extensive pre-processing required to derive 
transition tables for driving a shift-reduce parser. 

Chapter II describes in very general terms several basic 
mechanisms for performing such grammar-driven synthesis* 
relating them to the fundamental idea of performing a valid 
derivation under a context-free grammar. Chapter III 
provides a further elaboration of these mechanisms* aimed 
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toward the more concrete goal of oeing able not only to 
create/ but also to modify and delete parts of a 
hierarchical program structure/ in a syntactically 
consistent way. Chapter IV/ which is something of a 
digression/ considers from the viewpoint of database design 
how programs may be represented and accessed as databases 
during modification and during storage or transmission from 
one place or time to another. In Chapter V/ a conceptual 
description is presented of a prototype orogramming 
environment/ designed to allow the programming language in 
use to be changed by simply changing the language 
description installed in the system. This design is 
concerned solely with the facilities for program 
modification and entry/ and is based on the assumotion that 
a means for describing in a relatively simple way the 
semantic content of the program structures to oe Duilt can 
be found. Finally/ in Chapter VI/ the results of the 
research undertaken so far are summarized/ and some 
suggestions for future investigations are made. 
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1 1 . GRAMMAR-DRIVEN SYNTHESIS 



A. INTRODUCTION 

In this chapter/ several models for grammar-driven 
editors of increasing complexity are described in terms of 
the theory of context-free qrammars. Each editor receives 
two sequences of input symbols/ the first representing a 
context-free grammar/ and the second a series of commands 
which guides the synthesis of a sentential form of the 
grammar initially provided. The described mechanisms are 
capable of utilizing very general classes of context-free 
grammars/ including ambiguous and incomplete grammars as 
well as grammars with useless productions (i.e./ productions 
which do not occur in the derivation sequence for any word 
of the defined language.) For this reason/ we adopt the view 
that the fundamental product produced by such a synthesizer 
is a sentential form/ possibly containing non-terminal as 
well as terminal symbols. 

The first syntax-directed editor produced by the 
research group along the lines outlined in this section was 
written by B. MacLennan in November/ 1980 in LISP and called 
"A Universal Sy n t ax -D i r ec t ed Editor". The primary motiva- 
tion for the analysis of grammar-driven synthesis presented 
in this chapter was to perform an exhaustive review of the 
algorithms employed and to connect them to the mathematical 
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theory of context-free grammars in such a way as to justify 
the adjective "universal"/ as well as to provide reasonably 
convincing informal arguments that no critical loopholes had 
been missed. This technology for using a qrammar is com- 
pared with conventional parsing techniques/ and the feasi- 
bility of using such synthesizers as the foundation of a 
system providing interactive access to a hierarchically 
organized database (such as that representing an executable 
program structure) is discussed. 

B. GRAMMARS ANO SENTENTIAL FORMS 

It is assumed that the reader is familiar with the 
Backus-Naur Form/ or BNF/ notation for mathematical gram- 
mars. Appendix A contains a formal specification for this 
notational system. The basic concepts from the theory of 
context-free grammars used throughout this section are 
adapted from (Hopcroft and Ullman/ 1979). The present sec- 
tion is provided primarily for background and continuity. 

A context-free grammar has the following elements: 

-- A finite set T of terminal symbols/ 

-- A finite set N of non-terminal symbols/ 
disjoint f rom T , 

-- A finite set P of productions/ each expressed 
in BNF notation/ 

-- A designated target non-terminal t 
included in N. 
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In addition^ for the grammar to be context-free/ every pro- 
duction must be of the form 

< a > : : = X / 

where X is a string (possibly empty) of terminal and non- 
terminal symbols/ ana a is a non-terminal symbol. The acro- 
nym "CFG" is commonly used to abbreviate the phrase 
"context-free grammar". Throughout this chapter/ we will 
adopt the convention of using lower-case letters from tne 
beginning of the alphabet to represent non-terminal symbols/ 
lower-case letters from the end of the alphabet to represent 
terminal symbols/ and upper case letters to represent 
strings (possioly empty) of terminals and non-terminals. 
Since we will be considering only context-free grammars/ the 
term "grammar" will always be understood to mean "context- 
free grammar". We shall also assume that all grammars con- 
sidered are non-trivial/ that is/ that the sets T and P are 
non-empt y . 

1 . Sentential forms . 

The basic intuitive concept underlying tne idea of a 
context-free grammar is the notion of derivation: the 
replacement in a string of a single non-terminal symbol by 
an equivalent string of terminals and non-terminals as 
specified by some production. 

Let G = { T / N / P / t > be a grammar/ and let SCI) 
and S(2) be strings of symbols. (We adopt the notational 
convenience of using parenthesized integers to subscript 
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variable names.) Then we say 3(1) derives SC2) in one step* 
if SCI) and SC2) have the form 

SCI) = XaZ, S C 2 ) = XYZ/ 

and there exists a production in the set P with the form 

< a > : : = Y . 

In this case/ we write 

SCI) => S ( 2 ) . 

In an analogous fashion/ we may define the notion of 
a leftmost derivation/ for which the string X above contains 
no non-terminal symbols. 

A string S is said to derive a string S' in zero or 
more steps/ or simply derive a string S'/ if one of the fol- 
lowing conditions is true: either S = S'/ or else there 

exists a series of strings SCI)/ SC2)/ . . . / SCn) such 
that S => SCI)/ SCI) => S C 2 ) / . . ., SCn) => S'. In this 
case/ we write 

S *=> S’. 

A string W is said to be a sentential form of G if 
t *=> W/ where t is the target symbol of G. A sentential 
form with no non-terminal symbols is called a word. The set 
of all such words is called the language defined by G. Such 
a language is called a context-free language/ or "CFL". 

A grammar is said to be ambiguous if there exists a 
word in the language defined by the grammar with two or more 
distinct leftmost derivations. There exist languages 
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defined by a context-free grammar that are inherently ambi- 



guous: that is, which cannot be defined by an unambiguous 

context-free grammar. 

2 . ARGOT notation. 

rthile BNF notation is convenient for tneoretical 
manipulations because it incorporates a single underlying 
idea# that of replacement in accordance with a oroduct ion» a 
more powerful notation for practical specification of 

languages is desirable. 

For our purposes, we will adapt a system of notation 
called ARGOT notation, with a concise yet powerful 9et of 
replacement operators reminiscent of the operators used in 
the theory of regular expressions. This notation was 
developed as the core of a pattern-matching programming 
language called ARGOT CMacLennan 1975J . In fact, we will 
use a restricted version of thi3 notation, but it is 
convenient to introduce the full notation first ana then 
restrict it as required. A formal description of ARGOT 

notation is provided in Appendix A. 

a. Rules and ARGOT expressions. 

In place of a set of productions, ARGOT uses a 
list of named rules, each of the form: 

name: expression. 

Rule names perform the same role in ARGOT notation as non- 
terminal symbols in BNF notation; however, it is required 
that each rule have a uni aue rule name. 
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Terminal symbols or strings are denoted by 
under 1 i n i n i ng» use of boldface type* or enclosure by auote 
marks (")# whichever is appropriate for the typeface avail- 
able. 

The colon corresponds to the BNF metasymbol 
separating the rule name from the expression aenotinq 
how an occurrence of that rule name may be expanded. Rules 
are terminated by periods to separate rules unambiguously. 

The expression half of a rule is an indefinitely 
deep hierarchy of elementary replacement operations and 
sub-expressions# eventually terminating on the deepest lev- 
els with terminal strings or rule names. Each operator 
allows a specific replacement operation# which may oe 
thought of as being applied from the shallowest level of the 
hierarchy downward in a non-ae t e rm i n i s t i c fashion. Thus# a 
single ARGOT rule corresponds to a number of eaui valent BNF 
produc t i ons . 

b. Concatenation 

The simplest replacement operator is that of 
concatenation# or replacement of a single construct by a 
series of sub-constructs. The concatenation operator is 
denoted by simple juxtaposition. Concatenated expressions 
may be grouped into a single construct and used as a sub- 
expression by means of parentheses. A single BNF production 
expresses the same idea as a simple ARGOT concatenation 
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(except that in ARGOT an "empty" rule cannot occur). Thus# 
the BNF production 

<proaram> :: = program <identifier> <block> . 
is equivalent to the ARGOT rule 

program: "program" identifier block . 

The occurrence of a rule name means that that position in 
the sequence is to be expanded as defined by the named rule# 
while the occurrence of a terminal string means that that 
position in the sequence is to be filled by the quoted 
string. 

c. Optional constructs. 

An optional sub-expression is surrounded by 
brackets. The meaning of this operator is that at the 
specified point# the indicated sub-expression may either be 
placed into the symbol string or omitted. Thus# the rule 

statement: C label 1 action. 

allows replacement of "statement" by either "label action" 
or by "action". 

d. Alternation Operators. 

Two alternation operators are provided# simple 
and optional alternation. Simple alternation is denoted by 
means of a list of sub-express i ons separated by vertical 
strokes and surrounded by curly brackets. The construct may 
be expanded by choosing one of the sub-constructs as the 
replacement. Thus# by the rule 

digit: < "0" } " 1 " J "2"> . 
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the rule name "digit" may be reolaced by any one of "0", 
"1", or "2". 

The ootional alternation construct is denotea in 
the same way as a simple alternation# except that square 
brackets are used instead of curly brackets. This operator 
allows replacement not only by any of the inaicatea alterna- 
tives# but also bv the empty string. For example# the rule: 

sign: l " + " 

allows the rule name "sign" to be replaced by by "-"# 

or to be deleted (replaced by the empty string), 
e. Iteration operators. 

Three iteration operators are provided. The 
requi red iteration# or simple iteration# is denoted bv a 
plus sign followed by a sub-expression. This construct 
allows replacement by one or more instances of the sub- 
expression. Thus# the rule 

i nteger : + di gi t . 

means that an instance of "integer" can be replaced by 
"digit"# by "digit digit"# by "digit digit digit"# etc. 

Optional iteration# denoted bv the asterisk fol- 
lowed by a sub-expression# implies that the construct can be 
replaced by zero or more instances of the sub-expression. 
Thus# the rule 

astring: *"a". 

allows expansion of the rule name "astring" to the emoty 
string# or to any of the strings "a"# "aa"# "aaa"# etc. 
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The final form of iteration/ list iteration/ is 
denoted by surrounding two sub-expressions with a sharp sign 
on the left and three oeriods on the riaht. It allows 
replacement by one or more instances of the first sub- 
expression/ separated by instances of the second sub- 
expression. Thus/ the rule 

list: u atom " , " ... . 

allows replacement of the rule name "list’’ by "atom"/ "atom/ 
atom"/ "atom, atom/ atom"/ etc. 

f. Properties of the ARGOT notation. 

The most important feature of the notation is/ 
that although it is richer in operators and in this sense 
more expressive than 8NF notation/ it is not more powerful. 
A 1 anguaoe is context-free if/ and only if/ it is expressi- 
ble as a finite set of ARGOT rules. This can be shown by 
reducing ARGOT to BNF notation/ that is/ by providing algo- 
rithms for transforming any finite set of context-free BNF 
productions to an eguivalent set of ARGOT rules/ and vice- 
versa. This constructive proof is s t r a i gh t f o rwa r d and unin- 
formative/ as the desired transformations are fairly evident 
on an intuitive level. 

As originally defined/ the complete ARGOT pro- 
gramming language/ which allows syntactically-keyed computa- 
tion as well as input and output parameters to be passed 
between rules/ has the full computational power of the 
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lambda calculus (MacLennan 19751 



The notational suoset we 



are here calling "ARGOT notation" does not have the full 
power of the ARGOT language defined in this reference. 

The notation can also be regarded as a generali- 
zation of the notion of a regular expression. rte may think 
of a set of ARGOT rules as being a set of named regular 
expressions* and then allow rules to refer to themselves 
directly or indirectly to achieve the power of a context- 
free grammar. This notational similarity allows the simple 
statement of a sufficient (but not necessary) condition for 
the regul arity of an ARGOT-defined language. If a finite 
set of ARGOT rules can be arranged in such an order that the 
right-hand side of each rule refers only to rules occurring 
further down the list* the 1 anguaqe defined is regular. 
That this is so can be seen fairly readily. Such an order- 
ing allows replacement of each rule name except for that of 
the target by the right-hand side of each of the named rules 
in a terminating sequence. The resulting single rule is 
simoly a regular expression with operators and terminal 
strinas alone on the right-hand side. 

This result is of practical use* since if we 
know that a language is regular* then we know that simple 
(non-recursi ve) algorithms exist for processing it. The 
algorithms for processing it are considerably less compli- 
cated than if the language is context-free but not regular, 
in which case some sort of recursive mechanism is required. 
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3 . Restricted ARGOT notation (R-ARGOT). 

The full ARGOT notation/ as described/ has more 
expressive power than reouired for the application we are 
interested in/ for two reasons: 

-- its indefinitely nested structure reaui res recursive 
routines to access the sub-expressions in a rule/ and 
-- highly nested expressions are too complicated to ex- 
press eas i 1 y- 1 ea rned syntax units for the user. 

That the notation allows indefinite nesting is implied oy 
the fact that the notation itself is an inherently context- 
free language. Since we shall be accessing the grammatical 
descriptions of languages as databases/ it is highly desir- 
able to be able to describe and encode simole/ efficient 
access routines. In addition/ a simpler notation will allow 
us to conceptualize a given grammar as consisting of a col- 
lection of rules each of which is formatted in one of a fin- 
ite number of ways. 

rthat we would like is a notation that is expressible 
as a regular expression (as is 8NF notation) so that it is 
easily processed/ but retains an adequate amount of expres- 
sive power. These goals are met by appropr i atel y restrict- 
ing the nesting allowed within ARGOT expressions. The 
resulting notation is called R-ARGOT notation (for either 
restricted or regular ARGOT). 

The set of available operators is restricted to con- 
catenation/ required iteration/ simple alternation/ list 
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iteration# and the optional operator. Tne other operators 
are rendered superfluous by the nesting restriction. 

R-ARGOT expressions (rule right-hand sides) mav oe 
simple or complex. A simple expression is a concatenation 
of one or more terminal strinas# rule names# or optional 
rule names. A complex expression is an alternation# 
required iteration# or list iteration. Any sub-expression 
in an alternation or iteration must oe a rule-name. The 
first sub-expression in a list operation must be a rule- 
name. The second may be either a rule-name or terminal 
string. 

The effect of these rules is to limit the number of 
possible formats available for the grammar designer to a 
small set. Alternations and simple iteration operators will 
always be the topmost operator in a given rule expression if 
they occur at all# and the operands will be simple rule- 
names in such expressions. The list iteration operator must 
also be topmost# and only the second operand may be other 
than a rule-name# and if so# must be a single terminal 
string. Only if the concatenation operator is topmost may 
the operands be alternations# and even in this case no 
further operators are allowed in the rule. 

It is something of a surprise that such stringent 
restrictions result in grammars that are reasonably well- 
oriented toward human comprehension. The rules that result# 
when they are read informally# seem to express natural 
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syntactic units. It must oe aamitted that an imorovement in 
human comprehensibility might be attained by allowing one 
level of nesting. However# the simplifications in the 
rule-access algorithms provided by naming each sub- 
expression are so striking we have been led to retain R- 
ARGOT as described here. 

The languages defined in Aopendices A and 6 are 
defined using the R-ARGOT notation. In particular# the 
reader's attention is drawn to Appendix 6# which contains a 
grammar for the PASCAL programming language. Most of the 
syntactic rules can be seen to correspond to natural syntac- 
tic constructs within the language in a way that BNF produc- 
tions do not. 

One irritation encountered in the use of R-ARGOT is 
the implicit requirement to rename terminal strings which 
carry semantic information (that is# that occur as alterna- 
tives within an alternation). Where we would like to write# 
for instance# rules such as 

string: ♦ character. 

character: < "a" ! "b” ! . . . ! "z" >. 
we must instead write 
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string: + character, 
character: < a ! b ! . . . ! z > . 

a: "a", 
o: "b". 

• • • 

2 : " z " . 

To avoid the necessity to provide a large number of trivial 
rules renaming tokens/ we shall assume the existence of a 
facility in the system for escaping from the normal mode of 
grammar-dr i ven synthesis to predefined lexical synthesizers. 
Such a facility is analogous to the separation of the 
analysis task between the parser and scanner in a conven- 
tional compiler. Thus* we will assume that predefined rules 
exist with such names as "identifier"/ "integer"/ "string"/ 
etc. In the system to be implemented/ these rule names 
correspond to predefined input scanners and parsers avail- 
able to the language implementer. 

C. A SIMPLE GRAMMAR-DRIVEN STRING EDITOR 

In this section/ a simple mechanism is described caoable 
of generating sentential forms from an input grammar in 3NF 
notation. This mechanism serves as the fundamental model 
for grammar-driven editing using interactive production 
selection to direct the course of the synthesis. 
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The Basic Mechanism 



We may think of the basic mechanism* which will oe 
hereafter referred to as a Gramma r-Or i ven String Editor 
(GDSE ) , as a multitape Turing Machine with two input tapes* 
labeled PHASE1 INPUT and PHASER INPUT* four internal tapes 
labeled GRAMMAR, BUFFER, CURSOR, and PRODUCTION, and an out- 
put tape labeled OUTPUT. The PHASE1 INPUT taoe contains a 
context-free BNF arammar, which is stored internally on the 
GRAMMAR tape. The PHASE? INPUT taoe contains a series of 
editing commands which will be more fully described shortly. 
The BUFFER tape is used as a work area to synthesize a sen- 
tential form. The CURSOR and PRODUCTION tapes are used to 
hold indefinitely large integers which number the non- 
terminal in the BUFFER currently being expanded, and tne 
production being applied from the GRAMMAR tape, respec- 
tively. The OUTPUT tape is provided simply as a conceptual 
convenience: it is used to model the transfer of the final 
form produced to secondary storage. 

The operation of the mechanism is as follows: 
a. Phase One -- Copy and Check Grammar. 

The PHASE1 INPUT tape is copied onto the GRAMMAR 
tape. As this is done, the contents of the input tape are 
parsed in accordance with the grammar listed in Appendix A 
for BNF notation. Since this grammar is regular, the input 
t ape can be rejected or accepted as a legitimate context- 
free grammar in a finite number of steps. Without loss of 
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generality# we assume that the first production names the 
target symbol as its left-hand side* 

b. Phase Two -- Initialization. 

In phase two# the mechanism is used to generate 
sentential forms via valid derivation steps on the 0UFFER 
tape. First# the target non-terminal is copied from the 
first production onto the BUFFER tape. Then the following 
loop is executed. Each cycle corresponds to one step of a 
valid derivation. 

c. Phase Two -- Loop. 

A symbol is read from the PHASE2 INPUT taoe. If 
it is 'Q* ('for ’Quit*)# control is passed to the next step 
beyond the loop. 

If the order to quit is not received# two 
integers are copied from the PHASES INPUT tape. These 
integers are assumed to encode the relative position in the 
buffer of the next non-terminal to be replaced# and the pro- 
duction in the grammar to be used to replace it. both of 
the integers must be checked to oe sure that they refer to a 
real non-terminal in the 8UFFER and to a real production in 
the GRAMMAR. If they do# the left-hand side of the selected 
production is checked to make sure it is the same as the 
selected non-terminal. If any of these checks fail# the 
integers are simply ignored and the loop re-entered from the 
beginning. Otherwise# the indicated replacement is per- 
formed. In detail# the mechanism performs the following 
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First* an integer (suitably encoded) is read 
from PHASE2 INPUT and placed in the CURSOR register. Sup- 
pose this integer is N. The N'th non-terminal symool on the 
BUFFER tape is located. If there is none* control is 
returned to the top of the loop. 

Another integer is then read from PHASE2 INPUT 
and copied onto the PRODUCTION tape. Suppose it is M. The 
M'th production is located: if there is none* control is 
returned to the top of the loop. 

The heads are then moved to the N’th non- 
terminal on the BUFFER tape* and the left-hand side of the 
M’th production* and the two non-terminals compared. If 
they are not the same* control is returned to the top of the 
1 OOP. 



If they are the same* the right-hand side of the 
M’th production is used to replace the N’th non-terminal on 
the BUFFER tape* moving characters to the right to make room 
for the new symbols as needed. 

Finally* control is returned to the top of the 

1 OOP. 



d. Phase 2 -- End. 



The BUFFER tape is copied to UUTPUT and the 
machine halts* accepting. 
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e. Synopsis. 

The algorithm described is nothing more than a 
restatement* in somewhat more detailed terms* of the funda- 
mental method for producing some valid sentential form under 
a context-free grammar. Determinism has been introduced bv 
using an additional input phase* which encodes* as the 
derivation proceeds* choices for the next non-terminal to be 
expanded and the production to be used. Erroneous input 
during this phase is ignored. This simple mechanism cap- 
tures the essential flavor of gramma r-ar i ven synthesis. rte 
may note that the contents of the PHASE2 INPUT taoe may be 
obtained in sequence when they are needed* and are never 
re-used. Thus* this input process serves as an entirely 
adeauate model for an interactive process. Throughout the 
remainder of this section* we will assume that the "Phase 
Two User" is able to examine the internal state of the 
machine in order to determine the current state of the syn- 
thesis and decide what to do next. rte make this assumption 
to avoid cluttering the mechanism descriptions witn output 
routines* which do not have any impact on the current state 
of the synthesis in any event. 

2 . Properties of the GDSE . 

The fundamental property possessed by the GDSE is 
that it never contains an invalid form in the BUFFER* and 
that a PHASE2 INPUT string exists which will cause the 
machine to halt* accepting* with any desired sentential form 
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on tne OUTPUT tape. 

In one sense* these assertions are hardly suscepti- 
ble to a convincing proof* since the mechanism is so obv i - 
ously related to the notion of valid derivation in the first 
place that any proof is likely to be less convincing than 
this intuition. The proof can be carried through based on 
an induction over the number of times the mechanism passes 
through the loop. Since the BUFFER contains a valid senten- 
tial form (the target symbol) when the loop is entered the 
first time* and each step in the loop either leaves tne 
BUFFER unchanged or changes one valid form to another by 
expanding a single non-terminal in accordance with a produc- 
tion in tne input grammar* the BUFFER contains a valid sen- 
tential form whenever the loop is entered. When the ' U ' 
symbol is read* the last form generated is Disced on the 
OUTPUT tape prior to acceptance. (The machine may reject if 
the 'Q' symbol is missing). 

Given a desired sentential form* there exists some 
valid derivation sequence* starting with the target symool* 
such that each derives in one step the next* and the last is 
the desired form. (There may be more than one such sequence 
of steps). Each step consists of selection of a non- 
terminal in the last derivation* and its replacement by the 
right-hand side of some production. Thus* given tne list of 
derivation steps* it is easy to construct a list of pairs of 
integers for the PHASES INPUT tape which will recreate these 
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steps in the BUFFER. hence for any sentential form# there 
exists a PHASE2 INPUT tape which will cause that form to 
appear in the 8UFFER. Appending a 'Q' on this tape will 
cause the machine to halt# acceDtinq# with the desired form 
on the OUTPUT tape. 

3. Discussion. 

As previously mentioned# although conceptually sim- 
ple# the GDSE is the underlying model for all of our more 
elaborate gr ammar-dr i ven mechanisms. The GDSE plays a role 
for grammar-driven synthesizers analogous to that played Dy 
a Deterministic Push-Down Automaton (DPDA) for parser-based 
systems. The fundamental simplicity of grammar-dr i ven syn- 
thesizers arises from the fact that this underlying mechan- 
ism is a direct restatement# with determinism incorporated# 
of the very notion of a sequence of steps in a valid deriva- 
tion. The resulting simplicity is to be contrasted with the 
much more complicated "set of items” construction required 
to generate the DPDA associated with a grammar# which causes 
the relation between a grammar and its parser to be very 
indirect [Aho and Ullman 1977). The GDSE utilizes the gram- 
mar directly to synthesize words# rather than using it 
indirectly to produce a derivative mechanism able to decode 
words. 

We might note that we have allowed the output of the 
GDSE to be any valid sentential form# not requiring it to be 
composed of strictly terminal symbols. In other words# we 
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are taking as the fundamental entity defined by a grammar, a 
sentential form instead of a word. It is easy enough to fix 
up the mechanism so that before halting, it checks the 
string in the BUFFER for non-terminals and accepts only if 
there are none. Our decision not to do so is based on the 
philosophy that additional restrictions should not be intro- 
duced so long as the output without them is sensible. In 
practical terms, a valid sentential form under a grammar for 
a programming language corresponds to a Partially complete, 
yet we 1 1 -s t rue t ured program, with the missing parts labeled 
appropr i at e 1 y by non-terminal symbols. In fact, the ability 
to deal with such "reasonable" partial programs is one of 
the primary advantages of a programming system based on 
grammar-driven synthesis. 

Retaining this capability yields an even more 
interesting property. No problem develops if the GDSE 
encounters a non-terminal in the right-hand side of some 
production which is undefined. Once this non-terminal is 
copied into the BUFFER it can never be replaced, so once 
this action has been taken a word will never be derived. 
However, the use of an undefined non-terminal can yield a 
class of sentential forms. In the context of grammars 
defining programming languages, the described situation 
might occur if some subset of the complete grammar for the 
target language was in use. The resulting form would be 
meaningul, and lead to a complete program, once the complete 
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grammar were defined. 

Thus# we see that the class of grammar-driven syn- 
thesizers to be described have the ability to deal intelli- 
gently not only with partial programs# but also with 
part i a 1 1 y-comp 1 et e grammars# in a natural way. 

Finally# we note that ambiguous grammars present no 
problem for* the GDSE. If the inout grammar is ambiguous# 
this simply means that there is more than one way to gen- 
erate at least one sentential form. 

The question that remains to be answered is whether 
grammar-dr i ven synthesizers can be used to synthesize more 
interesting constructs than strings (for instance# some data 
structure encoding the algorithm represented by the word,). 
In addition# it is desirable to use a more human-oriented 
input code. In the remainder of this chapter# first the 
command# and then the synthesis capabilities will oe 
improved. The resulting mecnanisms will inherit the basic 
properties of the GDSE# however# which remains our fundamen- 
tal model for grammar-driven synthesis. 

D. AN IMPROVED GRAMMAR-DRIVEN STRING EDITOR 

In this section we improve the Phase Two command mechan- 
ism for the GDSE. The R-ARGOT notation is our primary tool 
for doing this. ^.This notation provides for a concise and 
human-oriented set of rules as the arammar definition# 
allows automatic expansion of rule names when there is only 
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one way for expansion to be aone» and provides a framework 
for selection of alternative expansion oaths based on keying 
the desired alternative by means of a mnemonic Keystroke. 
Yet the regularity of the notation allows synthesis to 
proceed in a straight-forwara, non-recursive fashion, pri- 
marily because the contents of the rule can be accessed by a 
finite automaton. These properties are not coincidental, 
since the desire to achieve them provided the primary 
motivation for restricting the ARGOT notation in the way 
Chosen . 

1 . Rules and transformations. 

we eventually would like to classify every possible 
rule name replacement according to some f i n i t e 1 y-exp ress i b 1 e 
scheme. To this end, we distinguish between the terms 
"rule" and "transformation". For 6NF notation, each produc- 
tion can result in one, ana only one, transformation of a 
non-terminal symbol to a string of symbols. For ARGUT and 
R-ARGOT notation, in contrast, each rule may express more 
than one such permissible transformation. The limited nest- 
ing of R-ARGOT operators allows us to list all of the 
t rans f o r ma t i on s allowed for an R-ARGOT grammar in a finite 
list. 

In order to further reduce the set of transforma- 
tions possible, we introduce a special class of symbols 
which are assumed to be distinct from either rule names or 
terminal strings, which we will call "e-symbols". They have 
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the purpose of serving as place markers in a sentential 
form# indicating points where optional strings formed 
according to a particular transformation may be inserted. 
We will use three classes of such symbols# with the notation 
" o ( r u 1 e name)"# "i(rule name)"# and " 1 ( r u 1 e name)". Tne 
characters "o"# "i" and "1" will be used to encode the exact 
sort of transformation by which the symbol can be replaced# 
and the rule name argument will allow the mechanism to 
access the symbols in the grammar by which they can be 
replaced. Since their expansion is ootional# for output 
purposes we may think of all of these symbols as represent- 
ing the empty string. When the buffer is to be copied to 
output# these symbols are simply skipped. 

With this notation in hand# we examine the four 
sorts of R-ARGOT rules: concatenations# alternations# 

iterations# and list iterations. 

Concatenations involve replacement of the rule name 
by a sequence of terminal symbols# rule names# and ootional 
rule names. These elements must occur in order exactly as 
specified in the rule. Any optional rule names are con- 
verted to the e-symbol "o(rule name)" when they are encoun- 
tered. Thus# the rule: 

array-type: l packed ) "array" "(" ranges "J " "of" type, 

allows replacement of the rule name <array> in the buffer by 
o(packed) array I <ranges> ] of <type> 

(In this section# we shall delimit rule names in the buffer 
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with angle brackets so tnat they cannot be confused with 
terminal strings.) If the symbol "o(oacked)" is never 
replaced/ this string would be copied to the output tape 
simply as 

array C <ranges> 1 of <tyoe> 

we see that a concatenation rule explicitly stands for a 
single/ invariant transformation. Implicit in the existence 
of an optional field/ however/ is an additional transforma- 
tion of the form 

o(rule name) => <ru1e name> 

The use of an e-symbol has allowed us to express what would 
have been one transformation with an indefinite format/ as 
an indefinitely long (but finite) list of transformations! 
each of fixed format. This notational trick will be further 
used in the next chapter to make the list of transformations 
associated with a grammar even more regular. 

Alternation rules are always of the form: 
name: { namel J name2 i . . . J name-n > 

and correspond to n transformations: 

<name> => <namel> 

<name> = > <name2> 

• • • 

<name> = > <name-n> 

Iteration rules correspond to two transformations: that per- 
formed when the rule name is first replaced/ and that 
corresponding to additional iterations. Thus/ a rule of the 
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f orm : 



name: t namel 

corresponds to the two transformations; 

<name> = > <namel> i( name ) 
i( name ) = > <namel> i( name ) 

List iteration rules similarly consist of two 
t rans f ormat i ons . A rule of the form: 

name: # namel name2 ... 
corresponds to the transformations: 

<name> => <namel> 1 ( name ) 
l(name)=> <name2> <namel> 1( name ) 

2 . Automatic synthesis* 

Having listed all possible t rans format i ons , we may 
now determine which of them can be performed automatically* 
Given a rule name/ the type of rule is effectively comput- 
able from the form of the right-hand side of the rule alone. 
If the rule is an alternation/ the user must be consulted in 
order to determine which of the n possible transformations 
is required. If the rule is a concatenation/ there is only 
one possible expansion. If the rule is a simple iteration 
or list iteration/ the initial transformation is required 
and should be automatically performed. It may be recalled 
that predefined rule names (such as "identifier”) are 
allowed in an R-ARGOT grammar to symbolize calls to prede- 
fined input scanners. Such rule names do not admit to expan- 
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sion by rule, but must be expanded by referral to the prede- 
fined scanner which may solicit data from the user. Hence, 
predefined rules cannot be automatically expanded. There is 
one other possibility: the rule name may be undefined. In 
this case, no expansion of any kind is possible. 

Terminal symbols, by definition, cannot be expanded. 
The e-symools all require user attention so also cannot be 
automatically expanded. 

As a matter of terminology, we may classify symools 
in the buffer as bound, free, or transient. 

Sound symbols are those which admit to no further 
replacement. Thus, in our system undefined rule names and 
terminal symbols are bound. 

Free symbols are those which require a decision as 
to whether or not they are to be replaced at all, or by what 
transformation they are to be replaced. The free symools 
are thus names for alternation rules and predefined rules, 
as well as the e-symbols. 

The remaining symbols can be transformed by one, and 
only one, transformation which is not optional. They 
represent intermediate steps of a required replacement 
sequence, may be automatically replaced without restricting 
the range of words which can be formed from the sentential 
form currently in the buffer, and thus may be regarded as 
"transient" in the sense that they are retained only until 
they are recognized and replaced by their eauivalent 



au t oma t i C a I 1 y . The transient symbols in the described sys- 
tem are names of concatenations, iterations, and list itera- 
tions. 

Since the expansion of transient symbols can only be 
done in one way, at the beginning of each Phase Two loop we 
would like to search the buffer for a transient Symbol and 
expand each one found, continuing this process until there 
all symbols are either free or bound. Un f o r t una t e 1 y , for 
unrestricted R-ARGOT arammars, there is no guarantee that 
this process will terminate. If one can start with a con- 
catenation, iteration, or list iteration rule and reach the 
same rule by applying a sequence of rules not including any 
optional or alternation rule, the described process may 
never terminate. Therefore, we must restrict the grammar so 
that no such cycles exist. 

Fortunately, the existence or non-existence of such 
cycles can be effectively computed given an otherwise syn- 
tactically correct R-ARGOT grammar. This restriction is the 
only semantic constraint we place on R-ARGOT grammars for 
the remainder of the discussion. The loss in expressive 
power is not great. Such cycles correspond to recursive 
expressions with no trivial case in 8MF-desc r i bed languages, 
and once entered, derive only forms with non-terminals and 
never words. 

iAiith this restriction, which can be enforced by 
Checking the input grammar during Phase One, we now may 



allow automatic expansion of transient symbols during tne 
Deginning of tne Phase Two loop prior to any furtner pro- 
cessing with the understanding that such expansion is to be 
performed until no transient symbols remain. With the gram- 
mar restricted as described/ this process must always ter- 
minate. Since the grammar is context-free/ the order in 
which transient symbols are expanded is of no consequence. 
We will refer to the automatic expansion of all transient 
symbols until none remain as "autoscanni ng M . 

The addition of the autoscanning feature relieves 
the Phase Two user of the burden of having to order expan- 
sions that are required by the grammar. The price paid for 
this facility is that only those forms can be produced which 
consist entirely of bound and free symbols. In the context 
of a programming language defined by a grammar/ the system 
will now synthesize as much of the program as is syntacti- 
cally deducible from the part of the program already created 
by the user. 

As a concrete example/ we display the results of 
autoscanning the target symbol for the PASCAL grammar listed 
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program <identifier> ( <identifier> l<filelist> ) * 

o ( i abe 1 s ) 
o(constants) 
o (types ) 
o ( va r i ab 1 es ) 
o ( subrout i nes ) 
begi n 

<s t at ement > 

1 ( st at ement s ) 

end . 

3 . Improved Cursor Control. 

The next improvement to be described is a more use- 
ful method of cursor placement. 

From the analysis above* we see that after autoscan- 
ning is performed* the buffer will contain only bound and 
free symbols. By definition* the only symbols requiring 
Phase Two input data for further expansion are free symbols* 
since bound symbols admit to no expansion at all. It fol- 
lows that the cursor should always rest on a free symool. 
If there are no free symbols, there are no symbols left to 
expand in the buffer, and the loop may be left, the buffer 
copied to the output tape* and the algorithm terminated. In 

general* however* one or more free symools will be left in 

the buffer at the end of autoscan. We wish to allow the 
user a means to move the cursor between them* and must also 
decide what to do after the symbol indicated by the cursor 
has been expanded. It should be clear that cursor movement 

never has any effect on either the contents of tne buffer 

nor on the valid derivations reachable at any point in tne 
synthesis. The first is true simply because cursor movement 
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leaves the Duffer unchanged* and the second Decause of the 
context-free nature of the expansion operation. 

Accordingly* after autoscanning* if there are any 
free symDols left* we allow the user to move the cursor back 
and forth Dy entering zero or more cursor control symDols 
(represented Dy "->" for movement riqht and Dy "<-" for 
movement left). 

The only question remaining is how to position the 
cursor initially* and how to reposition it after a symDol is 
expanded. We assume that after a symbol is expanded* the 
Duffer is autoscanned aqain to remove any new transient sym- 
Dols. If the section of the buffer replacing the expanded 
svmbol now contains one or more free symbols* the cursor is 
placed at the leftmost such symbol. Otherwise* it is placed 
at the first free symbol in the remaining string of symbols. 
If there are none* wraparound takes place and the cursor is 
placed at the first free symbol in the old substring to the 
left. Initially* the cursor is placed at the first free 
symbol in the buffer. 

4 . Transformation Selection . 

Finally* we address the problem of causing an 
optional transformation to be applied* once the cursor has 
been positioned as desired by the user. 

From the discussions above* the cursor must be rest- 
ing on a free symbol* that is* at either a predefined rule 
name or the rule name for an alternation* or at an e-symbol 
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of type o? i or 1. To simplify the command 1 anguaqe model? 
the entry of a blank is adopted as the uniform means of 
indicating that an exoansion is to take place at the current 
cursor position. If the cursor is at a predefined rule 
name? control is then turned over to the indicated prede- 
fined input scanner. If it is at an e-symbol? the appropri- 
ate transformation is made? the result autoscanned? and the 
cursor reoos itioned for another loop through the cycle. 
Finally? if the cursor is at the rule name for an alterna- 
tion? one of many potential transformations must be 
selected. Another symbol is entered and this is matched to 
keystrokes included in the rule body. 

Thus? we must extend the R-ARGOT notation to allow 
inclusion of the keystroke for each alternative which will 
trigger it. An alternation now looks like: 
statement: { 'a* assignment 

! ' i ' i f-statement 

! 'w' while-statement 

' 'c' case-statement 

> . 

The symbol 'a' will invoke the transformation 

<statement> => <assignment> 
the symbol 'w' the t ransf ormat i on 

<statement> => <wh i 1 e-s t a t emen t > 
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and so on 



Extensions to this simple system are easy to imple- 
ment and desirable. In particular* a string of more than 
one character could be allowed as Key. Some work has oeen 
done in allowing a " f a I 1 -t h rough " Key* sympolized by " '! M * 
which invokes the indicated transition upon any symbol which 
does not occur anywhere else in the list of alternative 
keys* and reapplies the entered symbol to the next alterna- 
tive generated. Such enhancements are not considered 
further in the present work. 

Thus* the only data which must be entered during 
Phase Two are cursor control commands* which leave the syn- 
thesized string intact but move the cursor* ana invocations 
of transformations* which consist of a single blank* fol- 
lowed oy nothing for e-symbol expansions (lists* iterations* 
or optional field inclusion)* by a context- dependent keys- 
troke for alternative selection* and by whatever is needed 
by the appropriate input scanner for such items as identif- 
iers* numbers* and the like. 

5 . Pi scus s i on . 

rte have now enhanced the capabilities of the GDSE on 
the input side to allow string synthesis driven by a human- 
oriented grammar* with a reasonaoly supple means of cursor 
control and transformation selection. The resulting mechan- 
ism still has the desirable properties of the GDSE: it can 
accept virtually any context-free grammar (we have lost 
those which contain irreducible recursions) and generate any 



form derivable under that grammar (some of which are 
automatically expanded). It is also still true that the 
buffer never contains an incorrect sentential form. 

The mechanism that has been described in this sec- 
tion is considerably simpler than that for a parser genera- 
tor. This simplicity is the result of allowing interaction 
between the user and the synthesizer during the stage when 
the grammar of the language is available to the mechanism. 
User-pro v i ded data is available to guide a true top-down 
synthesis of the desired word in the defined 1 anguage . 

The described system is highly useful in its own 
right. It could be used/ for instance* to prepare programs 
for entry into a conventional system with the guarantee that 
the program was syntactically correct. The compiler used 
would not need the ability to handle syntactic errors (a 
notably difficult design problem). In addition* since the 
input grammar is interpreted* the same editor could oe used 
for many different languages. 

rte want to do more* however. In the next section* 
we investigate one way to synthesize more complicated data 
structures using the grammar-driven editor we have described 
in this sect i on. 

E. TREE SYNTHESIS 

So far* all of the mechanisms described synthesize 
strings. In order to subsume the ideas already developed 
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under the general notion of tree synthesis* we first cnarac* 
terize strings as a soecial sort of tree. We then discuss 
the notion of parse trees* and generalize it to form the 
more general class of derivation trees* of which both string 
trees and parse trees are a soecial case. Since trees are a 
we 1 1 -unde rs t ood data structure* we shall not define them 
formally but treat their general oroperties in an intuitive 
fashion. For the remainder of this section we shall assume 
that the algorithms necessary to create ana manipulate gen- 
eralized (mul t i -ch i I dren ) * ordered trees are freely avail- 
able. Such trees consist of a finite number of nodes* each 
of which has a finite numDer of children occuring in an 
ordered sequence. 

In addition to having children* we assume that each node 
may also contain an indefinite amount of symoolic informa- 
tion. In particular* with each node may be associated a 
string called its label. 

Those nodes of a tree with no children are its leaf 
nodes. Since the tree is ordered* its leaf nooes may also 
be ordered into a linear list. fle assume that all of the 
nodes of a synthesized tree may De examined and accessed for 
the information they may contain. 

1 . Re-Interpret at i on of the GOSE. 

In all of the work that follows* we use a syn- 
thesizer that is formally identical to the GDSE. fte shall 
call such a mechanism a GDE* for Grammar-Dr i ven tditor. The 
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action taxen by those steps in the algorithm that actually 
interact with the SUFFER are re-i nterpreted as calls to 
t ree-man i ou 1 at i on subroutines. The BUFFER is now conceived 
to contain, not strinqs of symbols, but aop roor i a t e 1 y imple- 
mented ordered trees with labeled nodes. Rather than 
describing the algorithms involved to create, modify, and 
traverse such structures in detail, we assume that mathemat- 
ically correct subroutines are available to perform the 
needed functions, since methods for implementing trees using 
a sequentially-addressed, rewritable memory store are well- 
known. 

In order to re-interpret the improved 6DSE as a tree 
synthesizer in this way, we need routines to initialize the 
BUFFER with a target tree Cor initial tree), move the cursor 
back and forth, and replace a "symbol" with a "string of 
symbols" (whatever these terms mean in the new context). 
Also, we now need to explicitly identify the precise means 
used to "display" a tree. 

Supposing that appropriate routines are available, 
we wish to argue that the new mechanism, which synthesizes 
trees, instead of strings, inherits all of the formal pro- 
perties of the original, in the following sense. 

The display algorithm in use may be thought of as a 
function, d, mapping trees into strings. We shall consider 
a tree to be a "sentential form" of the input grammar of 
interest if, and only if, its image is a string whicn is a 
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sentential form of the grammar. 

fie wish to compare the operation of the old and the 
new mechanisms# given exactly the same stream of input sym- 
bols on the PHASE2 INPUT tape# supposing that the grammar 
specifications on the PHASE1 INPUT tape are equivalent in 
some as yet unspecified sense. The fundamental property 
that gives the GDSE all of the features that make it an 
appropriate synthesizer for sentential forms is that at each 
entry to the loop# the BUFFER always contains a correct 
form. This property is a consequence of the fact that the 
manipulations inside the loop either leave the contents of 
the buffer unchanged# or transform one valid form to 
another. Since the BUFFER is initialized with a valid form# 
by induction the BUFFER never contains anything but a valid 
form upon loop entry. 

we would like the new mechanism to perform the same 
derivation steps# given the same PHASE2 input sequence# as 
the old. The display function would then serve as a mor- 
phism from the new mechanism to the old# over the operations 
defined by the possible BUFFER transactions made available 
by the algorithm within its basic loop. Thus# if it is true 
that# for any given cycle through the loop by the parallel 
mechani sms # with i dent ical forms in the two BUFFERS at tne 
begi nning of the loop (as viewed under the display function 
for the new mechanism)# and that corresponding derivations 
are undertaken within the loop# then for every possible 
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derivation sequence that can occur under the old mecnanism 
there will be one / and only one» derivation sequence wnich 
occurs under the new mechanism/ and the product of the new 
mechanism/ when viewed under the display function/ will be 
identical to that of the old. 

The question of paramount interest/ is under what 
circumstances will this property/ that the contents of Doth 
BUFFERS will be display-equivalent for any step in 
equivalent machines/ be true? 

It is well outside of the scope of our research to 
provide a comolete answer to this question/ in the form of a 
set of necessary and sufficient constraints so tnat the 
desired property (which we might call "stepwise 
equivalence") is true. Rather/ we shall provide a descrip- 
tion in general terms of a natural class of re- 
interpretation constraints that are merely sufficient. 

In the imoroved GDSE/ the PHASE I INPUT tape con- 
tained a finite set of rules/ each of which consisted of a 
finite set of t ransf ormat i ons with one symool on the left- 
hand side/ and a string of symbols on the right-hand side. 
In the r e- i n t erpr e t ed synthesizer/ each transformation will 
consist of a specification calling for the replacement of a 
single leaf node/ labelled with the symbol on the left-hand 
side of the original transformation/ with a forest of adja- 
cent siblings with leaf nodes labelled with each of the sym- 
bols on the right-hand side. Such a tree t ransf ormat i on 
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specification will be referred to as a template 



"Reo 1 ace- 



ment of a symbol by a string" is now taken to mean the 
replacement of a labelled leaf node by tne forest of adja- 
cent siblings soecified by the appropriate template. 

In order to ensure that the structure in the BUFFER 
is always a tree* (since *e may allow replacement of a node 
by a forest), it is necessary to ensure that the root node 
in the BUFFER is never broken up into a forest. We there- 
fore impose the constraint on the system that the BUFFER be 
initialized with a tree consisting of a special root node 
with one child, labeled with the target symbol. Since only 
leaf nodes are ever replaced, no replacement ever turns a 
previously internal node into a leaf node (no transforma- 
tions have empty right-hand sides). Since the root node is 
initially internal, it is never replaced. Hence the struc- 
ture in the BUFFER is always a bona fide tree. 

The above suppositions are insufficient to obtain 
the stepwise equivalence property by themselves, since we 
have not addressed the display function, which is used to 
define what is meant by a tree which is a valid sentential 
f orm . 

In the final system to be described, the language 
implementer will be given the power both to select a partic- 
ular template from all of the valid candidate templates 
available, correspondi ng to the given transformation, and 
also influence the display order of the children of a given 
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The retention of stepwise equivalence depends jointly 



on the consistent application of this facility* and it is 
our oresent intention to provide a sufficient condition 
which does* in fact* preserve it. 

Selection of a single template for each transforma- 
tion in the oriqinal qrammar may be thought of as specifying 
a function* mapping transformations into templates. Let us 
name this function f. 

In the work immediately following* the display algo- 
rithm will be very simple. A tree is displayed by listing 
the labels for all of its leaf nodes in order. Since the 
right-hand side of templates are ordered forests* we may 
also speak consistently of applying d to the template: 
again* we simply list all of the leaf node labels in order. 
The required constraint is simply this: f and d must be 
inverse functions on the set of transformations in the gram- 
mar and selected templates. That is* each template must 
display as the transformation to which it corresponds. 
Finally* movement of the cursor back and forth is to be 
interpreted as movement of the cursor from leaf node to leaf 
node* as ordered under the display function. 

Under these conditions* stepwise equivalence will be 
retained by the new mechanism. The fundamental reason for 
this is that the display algorithm defined is* itself* 
"context-free". If a given tree is a sentential form* 
application of a template to it will yield a tree which is 
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also a sentential form. Moreover* the new tree will display 
as the same form as that yielded by the corresponding symbol 
replacement applied by the string synthesizer. Cursor move- 
ment also takes place in parallel. 

Since the new mechanism is stepwise equivalent to 
the old* it inherits all of the formal properties of the 
old. Of course* since the actual contents of the SUFFER may 
be sups t an t i a I 1 y richer in structure at any qiven time* the 
new mechanism may have emergent properties of its own in 
addition to those inherited from the GOSE* but such proper- 
ties can be utilized only by using an additional algorithm 
to access information that has been hidden in internal nodes 
of the tree in the BUFFER. 

A more flexible display algorithm will be used in 
the final system. The implementer will have the power to 
permute the display order of the nodes in a template* as 
well as to display strings stored with the rule instead of 
as labels of a node. The display algorithm retains the 
basic property of providing a context-free display* however* 
and the same constraint applies to the display and template 
specifications chosen: each template must* in fact* display 
as its corresponding transformation in order for the system 
to maintain stepwise equivalence. 
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2. Strings as Trees 



>"ie may think of a string as a special sort of tree 
which has a root node ana one child for each symbol in the 
string. Such a two-level tree we shall call a string tree. 
For instance# the string 

"if <expression> then <statement> o ( e 1 se-pa r t ) H 
corresponds to the string tree 

< root > 

if <expression> then <statement> o(else-part) 

In order to synthesize string trees with a GDE/ we 
initialize the BUFFER with the tree 

< r oot > 

<target> 

Replacement of a symbol by a string of symools is 
redefined as the replacement of a leaf node by a set of 
adjacent sibling nodes/ fitted into tne place of tne 
replaced node in the ordered list of leaf nodes. In other 
words/ the template corresponding to a given transformation 
is just an ordered forest of single-node trees. 

The resulting GDE/ although it aoes synthesize 
trees/ constitutes a system that is isomorphic to the GD3E . 

3 . Parse Trees. 

The concept of a parse tree occurs frequently in the 
theory of context-free grammars. 
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A e can view parse trees as the structures syn- 
thesized by another re-i nteroret at i on of the basic grammar- 
driven synthesizer. The initial tree is taken to be the 
same/ two node tree as for the case of string trees. Tne 
notion of replacement of a symbol by a string is re- 
interpreted as the addition of children to a leaf node/ 
labeled with all the symbols of the strina. In other words/ 
templates always take the form of a tree/ with the root node 
labeled with the left-hand side of the transformation/ and 
each child labeled with the appropriate symbol from the 
right-hand side. As usual/ the "string" in the BUFFER is 
the ordered list of leaf nodes. The resulting structure is 
considerably richer than that retained in the BUFFER by the 
GDSE/ since once a node is created/ it is never removed. 
(More accurately/ if it is removed while a leaf node/ it is 
immediately replaced by a copy of itself.). 

4 . Comparison of String Trees and Parse Trees. 

Ae take the view that string trees and parse trees 
are two special cases of a whole range of trees that can 
represent a particular sentential form. This ODservation 
can be justified by comparing the properties of the two 
types of trees. A string tree incorporates the minimum 
amount of historical information concerning the derivation 
sequence by which it was produced: just enough for further 
derivation to correctly proceed. As a result/ string trees 
are very compact. 
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Parse trees* on the other hand* incoroorate a very 



large amount of information concerning the derivation 
sequence by which they were produced: enough so that the 
entire sequence can be reconst rue t ed (down to the permuta- 
tion of commutative non-terminal selection). As a result* 
parse trees are very large. As a concrete example* Figure 1 
in Appendix H contains both the parse tree for a trivial 
PASCAL program. 

Our eventual goal is to provide for grammar-dr i ven 
synthesis of directly evaluable trees of reasonable size. A 
secondary goal is to do this in such a way that the result- 
ing tree can be displayed as a program in the language in 
which it was created* but can be evaluated without any addi- 
tional syntactical access. 

Neither strinq trees nor parse trees are suitable 
constructs for achieving these goals. String trees incor- 
porated no structural information and must be reparsed in 
order to access their semantic contents in the correct 
order. (This process may even be impossible if the string 
tree was synthesized under an ambiguous grammar.) Too much 
information has been discarded at the time of synthesis. 

On the other hand* parse trees are unreasonably 
large. Most of the nodes record syntactical information 
that is semantically con t en t -f ree . 
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Our task# therefore# is to find a way to reach some 



miadle ground# synthesizing trees which contain enough nodes 
to retain the desired control structure# but allowing the 
elimination of nodes which have no semantic content. 

The purpose of the oresent section is not to provide 
a complete descriDtion of how this is to be done# but to 
provide a conceptual range of intermediate possibilities. 
It will then be possible to choose the sort of tree to be 
synthesized to meet a particular requirement intelligently. 
In short# we wish to introduce some "engineering slack" into 
the formal system. 

This ourpose is realized by introducing the notion 
of derivation trees# a general concept of which both parse 
and string trees are a special case. 

5 . Per i vat i on T rees . 

One way to characterize the structure of a oarse 
tree is to note that every parent node in the tree derives 
its children in exactly one steo. Thus# the relation 
between parents and children in the tree is the same as the 
"=>" relationship. 

rte consider the set of trees in which each parent 
derives its children in zero or more steosl that is# incor- 
porates the "*=>" relationshio. 

Such trees may be constructed from a parse tree in 
the following manner: 

a. Mark the root and leaf nodes. 
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b. Mark zero or more of the remaining nodes, 

c. Discard each unmarked node. Every time a 
node is discarded, replace it within the 
set of its siblinqs by all of its children, 
taKen now as adjacent siolings. (This 
procedure preserves the relative ancestry 
of all undiscarded nodes.). 

The above procedure assures that every remaining 
node derives its new children in zero or more steps. This 
can be seen by noting that the hypothesis is true for the 
original parse tree, and that if true for a discarded node 
and its children, is true for the node's parents and its 
children during each application of the third step. Hence, 
it is true for the resulting tree. 

In the procedure just soecified, the selection of 
interior nodes to be retained is done non-determini st i'cal 1 y. 
It is the SDec i f i cat i on of the particular agorithm to De 
used for selecting nodes for retention tnat we make avail- 
able to the system implementer as an engineering choice. 
The two simplest algorithms are to retain all interior 
nodes, in which case parse trees are produced, or to discard 
all interior nodes, in which case string trees are produced. 

The trees produced by the procedure just described 
we call generalized derivation trees. Our goal, however, is 
not to produce a full parse tree and only then to prune it. 
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out to synthesize a pruned derivation tree directly as we go 
a 1 ong . 

This desire sugqests that we apply a oarticular syn- 
thesis uniformally, in the sense that for each transforma- 
tion implicit in the R-ARGOT grammar there Oe associated 
one; and only one; synthesis action. This suggestion is not 
quite a necessary implication: one could conceive of some 
history or context-dependent algorithm for selecting one of 
several oredefined synthesis actions associated with a 
t rans f orma t i on . In fact, such "intelligent" systems are an 
interesting subject for future research. 

But if the simpler orotocol is adopted, we ootain a 
sub-class of derivation trees, which we call derivation 
trees constructed by rule. Both parse trees and string 
trees are also members of this class. Hereafter, the term 
"derivation tree" will be understood in this restricted 
sense . 

The association of one, and only one template, with 
each transformation is very clearly an embodiment of this 
idea. The GDE previously described is thus a mechanism 
capable of synthesizing any class of uniform derivation 
trees desired for a given grammar in R-ARGOT. 

In essence, the next chapter represents the selec- 
tion of further constraints on the template formats to be 
associated with each type of transformation, in such a way 
that our design goals are acheived. The trees produced 
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under the set of orotocol s are a particular sort of deriva- 



tion tree constructed 0y ruler which we shall call hereafter 
abstract syntax trees. Thi3 name is adopted from the ideas 
contained in [McKeenan 19701 as representing an intermediate 
stage in the translation of some program in which a parse 
tree has had its syntax -dependent/ semantically void inte- 
rior nodes pruned away. 

6 . Elimination of Terminal Strings in Derivation Trees. 
An inspection of parse trees such as the one 
displayed in Figure 1 suggests three general classes of 
nodes for elimination: those representing a series of pro- 
duction steps needed to fill a high-level slot with a low- 
level construct (so-called "empty productions")/ those 
encoding options available but not so far taken (e-symbols)/ 
and those representing keywords and punctuation. 

As the next chapter shows/ selection of appropriate 
template protocols allows removal of nodes representing 
empty productions. It is our belief that nodes of the 
second type can also be eliminated by appropriate template 
selection and context-sensitive computation to compute the 
existence of a "virtual" option. 

he now investigate a metnodologv for eliminating 
most nodes required to hold terminal strings. 

he first make the observation that most such nodes 
are semantically cont ent -f ree . An examination of the R- 
ARGOT notation will show that terminal symbols can only be 
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added to a synthesis in one of two ways: by means of a con- 
catenation or list-iteration transformation^ or by means of 
a predefined (autoparsed) rule name expansion. In tne 
second case* the included string may well be meaningful/ 
e.g. if it is an identifier or the like. In the former 
case/ however/ since the required terminal string cannot be 
an optional field/ there is no choice as to whether the 
string can or cannot be included. If such a choice existed/ 
it must have been via an earlier option or alternative 
selection/ and by the template protocols specified in tne 
next chapter/ this selection is already encoded into the 
structure of the tree. There is thus no reason to add a 
node to the tree simply to represent an invariant field. 

On the other hand/ in order to be usable we must oe 
able to display the string as if it were a node in the tree. 
The solution to this quandary is to make provision for com- 
puting the location and contents of such virtual fields when 
the need arises. This can be done/ provided that list and 
concatenation rule templates always have a sinqle head node 
which can be associated with the specific rule from which 
they were derived in some wav (either by inserting a refer- 
ence to the rule into the node/ or computing the rule from 
context). If the contents of the virtual fields associated 
with the rule are then stored with the rule/ we can avoid 
repeating these strings throughout the derivation tree. 
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These ideas are more concretely discussed in the 
protocols for template construction in the next chapter. 

F. COMPARISON OF GR AMM AR-U T I L 1 Z A T I ON TECHNOLOGIES 

It is approoriate at this point to step back and place 
the system of grammar utilization described in this cnapter 
within the range of currently available technologies for 
grammar utilization. We shall compare thi3 system with the 
two common parsing technigues: bottom-up and top-down pars- 
ing. All three of these technigues may oe thougnt of as 
producing as output derivation trees. 

It should be recognized that the tree produced by a 
parser in contemporary translation systems is usually "vir- 
tual". The parser emits a series of syntax-directed action 
commands which may be thought of as the seguential represen- 
tation of a post-order traversal of a derivation tree. The 
"back end" of the system may be thought of as traversing 
behind the parser* destroying nodes as Quickly as they are 
ou i 1 1 . 

Both of the parsing techniques are designed to proceed 
automatically* that is* without any human intervention. The 
grammar-dr i ven synthesizer* in comparison* is inherently 
interactive. This property is both an advantage and a 
disadvantage* in that the synthesizer utilizes interaction 
to attain desirable goals* but cannot be implemented without 
interactive devices being available. 
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The need for the parser-oriented techniques to proceed 
automatically places a set of mathematical constraints on 
the grammars usable by such systems. The grammar-driven 
synthesizer is capable of utilizing almost any context-free 
grammar; a capability that allows the language designer to 
optimize the grammar selected for realizing some programming 
language towards a set of semantically natural rules which 
will be easy for the human user to understand. 

The parser-based systems are essentially decoders* 
translating a valid word in the defined language into a more 
complicated* but equivalent* structure. inherent in this 
process is the requirement for the user to use some other 
system* such as a keypunch or text editor* to formulate a 
valid input word in sequential form; a notoriously error- 
prone and tedious process. in contrast* the grammar-driven 
synthesizer allows the user to create the desired tree 
structure directly and with no possibility of syntactic 
error (since such errors are simply rejected immediately). 

Finally* we note that both parsing techniques synthesize 
the output tree from the bottom up. The grammar-driven syn- 
thesizer follows a true top-down synthesis: thus* the 
part i a 1 1 y-comp 1 et e structure is completely we 1 I -st rue t u red 
so far as it goes. The system is for this reason well- 
suited as a base for dealing with partially complete pro- 
grams . 



64 



III. conceptual design for gde 



A. INTRODUCTION 

In this chaoter a conceptual design for a Grammar 
Directed Editor is developed within the framework defined in 
Chaoter II. 

The mathematical model provides a large framework in 
which to design a Grammar Directed Editor, subject to the 
following restrictions: 

1. Grammar rules are limited to the concatenation, 
alternation, iteration, list, predefined, and undefined 
rules in the forms specified by the R-ARGOT notation. 

2. The templates associated with these grammar rules 
may consist of arbitrary forests of siblings, the leaves of 
which must be labelled in accordance with the transforma- 
tions summarized in Figure 2. 

3. The templates for list and concatenation rules which 
include terminal symbols must create head nodes which retain 
or refer to those terminal symbols for display. 

A Grammar Directed Editor constructed in accordance 
with these restrictions will produce a derivation tree whose 
leaves and terminal symbols, retained in head nodes, are 
disolayable as a valid derivation of the input arammar. 

The following design restrictions and goals serve as a 
basis for limiting the very general nature of the possible 
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templates to a set of generic templates which define the 
permissible transformations available for the construction 
of an Abstract Syntax Tree (AST): 

1. The AST should contain the minimum number of nodes 
consistent with the retention of all necessary semantic and 
schematic information. 

2. The structure of the AST should admit efficient 
editing algorithms* in particular for append* delete* and 
i nsert f unct i ons . 

3. The AST should not only be an evaluable structure* 
but further it should require no "preprocessing" between 
editing and evaluation operations. 

4 . The generic transformation template structure should 
be such that the creation of specific templates for a given 
grammar can be automated over the simplest possible input 
data* perhaps as simple as a grammar in a suitable notation. 

The methodoloay employed in the design process described 
in the following section is to apply* working within the 
constraints which the mathematical moael suggests* such 
further constraints and definitions as may be necessary to 
develop generic templates for each transformation which 
realize the design goals. In section C* a method for 
displaying the AST is developed which is consistent with the 
generic templates as well as with the requirement that the 
valid derivation which the AST represents oe displayable as 
such. Section 0 introduces the notion of a Language 
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Definition# therein an R - A R G 0 T grammar is translated into an 
ordered collection of transformation templates and display 
schemas which serves as the basis for the construction and 
display of an AST . 

8. TRANSFORMATIONS 

1 . Operators and Rulenames 

Figure 2 is the result of precisely defining tne 
leaves produced by each of the transformations defined in 
Chanter II. 

A simple change in notation produces Figure 3/ 
wherein every rulename in a transformation is associated 
with an operator to form a two-oart label/ as follows: 

<r> = NT / r 

copt(r) = COPT/r 
i oot ( r ) = I0PT / r 
lopt(r) = LOPT/r 
pdf (p) = PDF (p) /p 

where r is any grammar rulename and p is any predefined 
rulename. The first part of a label/ the operator/ will 
guide future transformations. The second part/ the 
rulename/ serves as a reference to that section of the 
language-specific data base containing the information 
requi red for performing transformations or display. In 
other words/ labels may be thought of as a se 1 f -mod i f y i ng 
"program" for the Grammar Directed Editor stored in the 
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hierarchical AST structure by previous versions of the pro- 
gram/ encoding all of the information necessary for suose- 
quent modifications or display of the structure. 

Note that as a result of the notational convention 
adopted here that the set of possible labels is finite over 
a finite set of grammar rules and/ therefore/ the set of 
templates required for such a grammar is also finite. 
Further/ the tyoe of transformation which may be applied to 
a given node is determined entirely by the operator and rule 
type association stored within that node. 

The alternation and predefined transformations 
present a problem/ however: although the "NT" opcode is 
usually stored in transient nodes/ these two particular 
transformations must be stored in free nodes. The alterna- 
tion requires that the user select one of the possible 
alternatives/ and the predefined functions require that the 
user input a string which they then process. This irregu- 
larity is resolved by the introduction of two new operators 
ALT and TERM and the following pairs of transformations: 



NT/a 


-> 


ALT, a 






ALT, a 


= > 


{ NT , r 1 i .. 


. i N T , r n } 




NT,p 


= > 


TERM/p 






TERM/p 


= > 


PDF ( p ) , p 






The operators "ALT 


H and "TERM" may 


be thought 


of as 1 og i - 


cally equivalent 


to 


"NT", but as 


explicitly 


labelling (for 



display purposes) the nodes as free (for synthesis 
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purposes). Figure 4 reflects these modifications to tne 
general t rans f ormat i on table. 

The introduction of the two new labels ALT/a and 
TERM,p, while not altering the leaves produced by the origi- 
nal t rans f ormat i ons and thus not violating the validity of 
the mathematical model's results to systems based on this 
extension/ orovide the following benefits: 

a. The format for the five defined types of tem- 
plate sets is more regular. At least two t ransf ormat i ons 
are associated with each rule type. The first of these 
transformations i s> in every case/ a required transforma- 
tion. The second and following transformations require some 
form of interaction with the user. 

b. Every node whose label has an "NT" operator may 
be automatically expanded during the autoscan process. 
Thus/ after autoscan/ the only leaves whose labels contain 
the "NT" operator will be those correspond!' ng to undefined 
rules. 

c. Since for every unique label there is one and 
only one transformation Possible/ no contextual information 
need be extracted from the AST in order to select and per- 
form the correct t ransf ormat i on. This simplifies the tasks 
both of language implementation as well as AST formation 
since production and invokation of a transformation template 
is independent of any AST contextual considerations. 
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2 .. Transformation Restrictions 



The transformations as discussed so far define only 
the leaves of a possible forest of siblinas which are to 
replace a particular node of the AST. we now turn our 
attention to designing the interior structure* if any* of 
the forests generated by the transformation templates. In 
the absence of other design goals or res t r i c t i ons * trie driv- 
ing motivation in determining the forest structure is to 
obtain as much simplicity and economy of space as possible. 
These goals must be balanced with the necessity to retain 
semantic or schematic information to preserve the valid 
derivation property* as well as to retain sufficient struc- 
tural information so that insertion and deletion editing 
functions may be convenient for the user as well as effi- 
cient algorithmically. The requirement to be able to delete 
synthesized subtrees turns out to constrain the template 
structures such that the other goals are also met. 

In order to recover gracefully from erroneously con- 
structed portions of the AST* the user should have the capa- 
bility to delete any node in the AST* which* as for any 
hierarchical structure* inevitably involves the ability to 
delete any subtree. The valid derivation property of the 
AST requires that deletion of a subtree from an AST be real- 
ized as the replacement of the entire subtree by a node 
which can validly derive that suotree and which also forms a 
valid derivation with the remainder of the AST. The choice 
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of the transformation to be apolied to a node in the AST is 
based solely on the information contained in the node itself 
and is completely independent of the node's context. There- 
f ore ; deletion of a subtree must be eaui valent to replace- 
ment of that subtree by a node with the same label; that is. 
the same operator and rulename; which the node which was 
expanded to form the deleted subtree contained when the node 
was oriqinallv created. The constraints orovided by the 
abstract model of Chapter II are not sufficient to guarantee 
that this can be consistently and efficiently accomplished. 
For example; consider a grammar which has only concatenation 
rules; each of which is entirely either nonterminal symools 
or terminal symbols. Since the model allows the definition 
of templates for concatenation rules which have no terminal 
symbols without a head node; the tree derived from such a 
grammar could be a string tree; containing no information 
for reconstructing a node being considered for deletion. 
The only action possible for a deletion algorithm in this 
case would be to delete the entire tree. However; consider 
the effect of the following proposed r es t r i c t i on s : 

a. All immediate children of a (necessarily oound) 
node must be created by the transformations of the rule oy 
which their father was bound. 

b. when a node is bound; the rule whose transforma- 
tion bound the node is permanently recorded in the node. 
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c. A given transformation may generate two or more 
childless siblinas, or a subtree of the current node, but 
not both. 

d. If a subtree is created by a transformation, it 
is limited to at most a single generation of children and 
may consist of a single node. 

Given these rest r i c t i ons , the rule (and therefore, 
at worst, a choice between two transformation temolates) 
which originally created any given node in the AST can be 
identified by examining its father. Computation on the 
father rule templates allows retrieval of the unique node 
from which the subtree to be deleted was formed. This 
uniqueness is further discussed oelow. 

3 • Transf ormat i on Templates 

Given the restrictions developed in the previous 
section, we are prepared to define the forests produced oy 
each of the eleven t rans f ormat i ons • The notation utilized 
in the transformation templates below is defined in Appendix 
C. 

a . Cone a tenation 

Rule: 

c : xl x 2 ... xn , x k = { r k ! " C " r k " 1" 1 tk > 
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T emp late 







headop,c ( 


ft { NT , r k i 


f x k - rk 








! COPT , r k if 


x k = " t" r k " J " > 








" ? " ... ) if 


for some k, 


NT,c => 








xk = { rk ! " 1" rk H ] " > 






headop » c 


if for all k , x k in T 






headoo = f HEAD ! 


predefined function > 






There are 


six cases to 


be considered in the 


transformation to be 


apo lied to the 


l abe l NT , c : 






nonterminals terminals 


comment 


Case 


1 : 


0 


NO 


undefined rule 


Case 


2: 


1 


NO 


useless production 


Case 


3: 


>1 


NO 


head required by delete 


Case 


4: 


0 


YES 


terminal s only 


Case 


5: 


t 


YES 


nead reauired oy model 


Case 


6: 


>t 


YES 


head required by model 



Case 1 corresponds to the undefined rule wherein 
no righthand side of the rule exists. The undefined rule 
t ransf ormat i on is discussed below. 

In cases 3, 5, and 6 it is required that a head 
node be created, in cases 5 and 6 by the mathematical model 
for the retention of terminal information and in all cases 
by the restrictions defined for the deletion algorithms. In 
each case the head node replaces the nonterminal under 
transformation and the nonterminal and/or optional children 
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are realized as the immediate children of the head node 



In case 4 a head node retaining the terminal 
information reolaces the nonterminal being transformea. 
Since there are no nonterminals in the grammar rule for 
which this form of this transformation is utilized, no chil- 
dren are created. Note that this node is bound since it is 
transformed into a node which is not one of the label forms 
for which transformations are defined, in fact, this is the 
only bound leaf node form generated outside the realm of 
predefined functions. 

Case 2 is the useless production. we could, 
without violating any of the restrictions thus far imposed, 
define this case of this transformation as a single node 
replacement, i.e., as NT,c => NT,r, thus avoiding the crea- 
tion of a head node carrying no information. However, we 
see the useless production as a very rare and usually 
unnecessary occurrence which does not justify the increased 
algorithmic complexity reguired for its detection. There- 
fore, it is treated in the same manner as cases 3, 5, and 6. 
Implicit Template? 

C0PT,r => NT,r 

This label must be accompanied by some form of 
user attention in order that the transformation be invoked, 
the nature of which is discussed in the next section. 
Assuming for the moment that the user has elected to take 
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the option/ the transformation applied is a single node 
replacement wherein the operator COPT is overwritten with 
NT, and the rulename remains unchanged. 

Note that the rulename in the COPT label may be 
any of the six rule types/ including undefined/ which raises 
the question of where to store the template for this 
transformation. The solution is to make this transformation 
implicit/ that is/ to apply the transformation without an 
explicit template being stored in the grammatical data base. 
This mav be done since the transformation is invariant over 
all rules in any grammar/ depending only on the requisite 
user attention and the COPT operator, 
b. A1 ternat ion 

Rule: 

a : rl " ! " r 2 ... "!" rn " > * 

Tempi ate l : 

NT / a => ALT / a 

The t ransf ormat i on for the label NT,a is a sin- 
gle node replacement? the operator NT is replaced with ALT/ 
and the rulename remains unchanged. 

T emp late 2 : 

NT/rk if user input valid 

ALT/a => 

ALT/a otherwise 

This label must be accompanied by user inout 
indicating which of the alternatives is desired? suppose for 
the moment it is the kth. The t r ans f o rma t i on aoplied is a 
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single node replacement wherein the operator ALT becomes NT 
and the alternation rulename is overwritten with the 
rulename of the kth alternative. If the user input does not 
correspond to any of the alternatives/ the t rans f ormat i on 
returns the node unchanged, 
c . 1 1 erat i on 

Rule: 

i : r 

T emp late l : 

NT, i => ITER/i ( NT/r ; IOPT.i ) 

While not required by the mathematical model/ a 
head node is created by the t r ans f o rma t i on for the label 
NT/ i to fulfill the deletion requ i r ement s . The two leaves 
specified by the model are formed as the immediate children 
of the head node in which the operator NT was replaced by 
ITER. A side effect of the invariant creation of a head 
node is that/ while inconsistent with the model/ terminal 
information applicable to every real child in the iteration 
sibling string/ as opposed to the trailing IOPT child/ could 
be included in the iteration rule if an appropriate exten- 
sion were made to the R-ARGOT notation. 

Template 2: 

IOPT, i => NT,r ? I0PT,i 

Triggered by the appropriate user input, the 
transformation for the label IQPT,i replaces the node with a 
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pair of sidings which are the leaves required by the model. 
Note that the rulename in the IOPT label is the same 
rulename which bound its father. Thus* all children of the 
ITER node* whether formed when the ITER node was bound or 
subsequently when the IOPT node was expanded* are formed by 
one of the t ransf ormat i ons under the rulename stored in the 
ITER node* as required, 
d. List 

Rule: 

1 : rl x * x = f r2 ! H r H r2"]" ! t > 

T emp late 1 : 

NT * 1 => LIST, l ( NT * r 1 ,* L0PT*l ) 

The transformation for the label NT*1 replaces 
the operator NT with the operator LIST* forming a head noae 
as required by the model in the case the second right-hand- 
side argument of the grammar rule is a nonterminal and in 
every case bv the deletion requirements. The required 
leaves form a sibling string under the LIST node. 

Template 2: 





NT * r2 ** 


NT , r 1 ,* LOPT * 1 


i f 


x — r 2 


LOPT * l => 


COPT, r2 


,* NT * r 1 ; LOPT , 1 


i f 


r 

It 

X 




NT * r 1 ? 


LOPT, 1 


i f 


X = t 



The transformation for this label has three 
forms* as indicated* for the three possible cases. In all 
Cases* the LOPT node being transformed is replaced with a 
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sibling string as shown, the nodes of which are the required 
leaves. As in the IOPT t r ans f o rma t i on , the LOPT label car- 
ries the same rulename as its father so that all children 
created under a LIST head node are derived from a common 
parent rule. 

e. Predefined 

Rule: 

p : pdf 
T emo late 1 : 

NT,p => TERM, p 

The transformation for the label NT,p is a sin- 
gle node replacement, the NT operator being overwritten with 
TERM and the rulename remaining unchanged. 

Template 2: 

PDF(p, st ring) ,p if PDF ( p , s t r i ng ) valid 

TERM, p => 

T£RM,p otherwise 

The label TERM,p must be accompanied by 
appropriate user input before the t ransf ormat i on is applied. 
The exact nature of the t rans format i on applied is dependent 
upon the predefined rulename, but certain c ha r ac t e r i s t i c s of 
the transformation may be generalized. The transformation 
results in either a single node replacement or a possibly 
many-leveled subtree? it may not generate siblinas or a 
forest cf siblings. As regards the deletion restrictions. 
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the subtree created by a predefined function is considered a 
single unit for editing purposes that is not subject to 
internal deletions or insertions. System provided prede- 
fined rules* if the input is valid* invariably result in a 
bound node or subtree of bound nodes; a free node in the 
subtree would imply knowledge of language-specific grammar 
rules which no general purpose predefined function could 
have. User-supplied predefined functions* allowable as a 
language-specific extension to the system* may admit such 
free nodes* however* the language implementor is responsible 
for ensuring the syntactic integrity of the AST is preserved 
over such transformations. 

If the input accompanying the label is rejected 
by the predefined function* the transformation is null and 
the node is unchanged. 

f. Undefined 
Impl i c i t Tempi ate; 

NT*u => NT*u 

The undefined label undergoes a null* implicit 
transformation. 

A . User Attention 

Of the eleven transformations* six define the action 
to be taken for the six possible nonterminal labels. The 
remaining five* the second transformation template for each 
of the five defined rule types* all require some form of 
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user attention orior to the application of the specified 



template. The form of user attention reauired is dependent 
upon the operator but generally may be characteri zed as con- 
sisting of two parts: an indication that the user wishes to 
direct attention to the current node* and a oossibly emoty 
character string utilized by the transformation as an i nout 
parameter. The five t ransformat ions requiring user atten- 
tion fall into three classes, as follows: 

a. IOPT, COPT, LOPT 

The three optional operators require simply that 
the user elect to expand the optional node. Thus directing 
attention to an optional node is sufficient for application 
of the template and the character string parameter is not 
requ i red . 

b. ALT 

The Alternation operator requires that the user, 
after directing attention to the alternation node, provide a 
character to be utilized in determining which of the possi- 
ble alternatives is desired. 

c. TERM 

The TERM operator requires, in addition to the 
user's attention, a character string for processing by the 
predefined rule associated with the node. 

The exact format of the user attention parameter 
is implementation dependent, but is summarized abstractly as 
follows, by operator: 



80 



operator 


user attention 


COPT 


<e l ec t 


opt i on> 


IOPT 


<e 1 ec t 


opt i on> 


LOPT 


<e 1 ec t 


opt i on> 


ALT 


<char> 




TERM 


<st r i ng> 


Deletion and 


Insert i on 





Earlier it was asserted that templates defined in 
accordance with an aooropr iate set of restrictions would 
allow deletion of any subtree from the AST using only tne 
rulename of the subtree's parent node. he now verify that 
assertion based on the templates as defined above. 

Of the six rule types/ three may be excluded from 
consideration as potential parents of nodes to be deleted. 
Undefined rules never form children and thus are never 
referenced for deletion. Predefined rules are defined to 
create subtrees which can be edited only as complete units. 
Alternation rulenames never appear in bound nodes of the AST 
since the alternation rulename in a free node is overwritten 
with the rulename of the alternative rule chosen. Thus only 
concatenation, iteration, and list rules remain as potential 
parents of subtrees whose deletion is desired. The parent's 
rule type in each of these three cases may be positively 
identified by the parent node's operator: if the operator is 
ITER, the the parent rule is an iteration; if LIST, then it 
is a list rule; and if otherwise (either HEAD or a 
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predefined function)/ then the parent rule is a concatena- 
tion. The templates for these three rule types allow 
recreation of the original label which existed when the root 
node of the subtree to be deleted was initially created. 

A parent concatenation rule/ upon initial expansion/ 
creates a fixed number of children/ all of the forms NT/r 
and COPT/r. By inspection/ no transformation or sequence of 
transformations on these labels for anv of the six rule 
types may create additional siblings under the parent con- 
catenation rule nor may they reorder the subtrees initially 
created. Thus the initial fixed number and order of chil- 
dren created remains constant. Suppose some subtree/ say 
the ith/ under the concatenation rule parent is selected for 
deletion. The siblinq which was originally created by the 
concatenation rule as its ith child may be reconst ructed by 
traversing the concatenation rule template until the ith 
sibling list element is encountered. This sibling list ele- 
ment contains the information by which the node replacing 
the subtree to be deleted may have its operator and rulename 
fields reinitialized. Deletion of a subtree under an itera- 
tion rule parent node is made possible by the consistent 
manner in which the two iteration rule templates create 
children of the parent node. The first child is created by 
the first template and the deletion process for the first 
subtree is similar to concatenation deletion. Subsequent 
subtrees/ up to the trailing IQPT/i node/ are created by the 
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second template and the information necessary to recreate 
any label mav be retrieved from the first siblina list e 1 e - 
ment of that template. The IOPT,i child is invariant in 
location and form and is not subject to deletion. 

Deletion of the first subtree under a list rule 
parent is handled in the same manner as the first subtree 
under an iteration oarent. Subsequent subtrees/ up to the 
LOPT/1 node/ are also similar to iteration rule subtrees 
except that they may have been created in oairs. Examina- 
tion of the list rule's second template will reveal whether 
subtrees after the first must be treated in pairs or may be 
handled singly. In either event/ the information necessary 
to recreate any given child is available in the template. 
The LOPT/1 child is not subject to deletion. 

So far deletion has been concerned only with 
"unpa r s i nq" an incorrectly formed subtree to a single ances- 
tor node so that the subtree may be correctly recons t rue t ed . 
For subtrees of concatenation rules this is the only form of 
deletion which retains the valid derivation property. Sub- 
trees of iteration rules/ however/ are all derived from the 
same label and thus are all syntactically equivalent when 
viewed from their root. Further/ the only restriction on 
the number of iteration rule node subtrees is that there 
must be at least one in addition to the IOPT node. Thus/ 
deletion of an iteration rule subtree/ exceoting throughout 
the trailing IOPT node/ could be realized as the actual 
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physical deletion of the entire subtree including the root 
node, as 1 onq as at least one subtree remains. As a corol- 
lary, a node oroperly labelled in accordance with the itera- 
tion parent rule could be inserted in front of any node in 
the iteration sibling string without violating the valid 
derivation prooerty. The insertion procedure requires the 
same information as deletion, the rule tyoe and rulename of 
the parent node, in order to construct an aporopr i atel y 
labelled node for insertion into an existing iteration node 
sibling string. 

List rules whose second argument is a terminal sym- 
bol form AST structures equivalent to iteration constructs 
and thus physical deletion (as opposed to unparsing to a 
single node) as well as insertion are valid operations. 
List rules in general present a more complicated problem in 
that subtrees after the first are formed in pairs. However, 
extending the argument concerning syntactic equivalence of 
subtrees to pairs of subtrees is st raiqht forward and allows 
physical deletion and insertion to apply to list rule sub- 
t rees as well. 

In summary, deletion is realized as a replacement 
operation for all concatenation rule subtrees and for soli- 
tary iteration and list rule subtrees, wherein the subtree 
to be deleted is replaced by a single node which is a recon- 
struction of the subtree's initial state. Under iteration 
and list parents where other subtrees exist, deletion 
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results in the physical removal of the suotree or subtree 
pair; recons t rue t i on may be accomplished at the same or some 
other location under the parent by a separate insertion 
ooerat i on . 



C. DISPLAY SCHEMAS 



Thus far a method 


of const rue t i ng 


an 


AST 


has 


been 


devel oped 


utilizing 


transformations to 


expand 


nodes 


i n 


accordance 


with a set 


of templates sorted 


bv 


rulename 


such 



that the AST represents a valid derivation of the associated 
grammar . Attention is now focused on displaying the AST; in 
particular a method is developed in this section by which 
the valid derivation of the grammar which the AST represents 
may be displayed. 

Display of the AST is the result of a generalized 
inorder traversal, beginning with the root node, with termi- 
nal and nonterminal symbols being displayed in accordance 
with schemas associated with each label. The display need 
not be strictly preorder since provision is made to display 
subtrees under a parent node in any order as directed by the 
parent's rule schema. This capability is provided to allow 
for the case where the evaluator may have to access the sub- 
trees in a different order than that implied by the syntax 
of the target language. 

Schemas are referenced by the rulename associated with 
each bound and free node in a manner similar to the 
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referencing of templates so that the display associated with 
a subtree is independent of the context of that subtree. 

The valid derivation need not be disolayed in its 
entirety. For examole, the means is provided to display all 
undefined nonterminals as they occur in the AST as part of 
the valid derivation. If the 1 anguaqe implementor chooses, 
however, he may elect to not display any of the undefined 
nonterminals which appear in a partial grammar he is imple- 
menting in its incomplete state. 

In the following two sections, first the schema language 
is defined and then the formation of schemas for eacn of the 
ruletypes is developed. 

1 . Schema Language 

There are three types of display information pro- 
vided for in the schema language: format control, literal 
strings, and subtree indicators. A system for handling com- 
ments has not yet been developed. However, it is envisioned 
as an extension to the schema language and not as part of 
the grammar for the tarqet language. 

Format control information is encoded mneumon i c a 1 1 y 
in the double capi tal -letter strings "NL", "TR", and "UT", 
interpreted respectively as "newline", "tab", and "untab". 
UT simply causes a variable, "tabcount", to be decremented. 
T8 causes a tab control character to be transmitted to the 
outout device and increments "tabcount". NL causes a new- 
line character and "tabcount" tabs to be transmitted to the 
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output device. Format control information is provided for 
readab i 1 i t y only. 

Literal strings are arbitrary character strings, 
delimited by douDle quotes, that are transmitted directly to 
the outout device. Literal strings provide the mechanism 
for the display of terminal and nonterminal symbols in the 
derivation represented by the AST. 

A subtree indicator, denoted by a dollar sign fol- 
lowed by an integer interpreted as a child number, directs 
that that subtree be entirely displayed prior to resumption 
of display of the current schema. An optional display 
field, consisting of an equals sign followed by a literal 
string, mav accompany the subtree indicator to provide the 
means for displaying undefined nonterminals, the three 
optionals, and TERM nodes, as described in the following 
paragraphs . 

An undefined nonterminal may apoear for a variety of 
reasons, the most common being as a placeholder in a oartial 
grammar. Since the rule for the nonterminal does not exist, 
there can be no schema, so the optional field, if provided, 
is invariably utilized. If not provided, nothinq will be 
displayed for the undefined nonterminal. 

The three optional nodes, COPT, IOPT, and LOPT, 
require special handling since there is nothing inherently 
"optional" about a rule. Rather, the optional nodes are 
placeholders to indicate to the user the possibility that 
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the rule specified may be invoked* if the user so chooses* 
but also may be left uninvoked in a " complete" AST, Since 
it is the father rule which holds the information that this 
rule invocation may be an as vet unelected option* the 
father rule schema contains the information* in the form of 
an optional diSDlay field* to display the node accordingly. 

The predefined rule referenced by a TERM node is in 
general a 1 anguage- i ndependen t system routine. As such, it 
has no knowledge of the nonterminal name which it* when 
invoked by the user on a string* is replacing in the valid 
derivation. Since the father rule does have this informa- 
tion* the father rule schema contains the optional display 
field necessary to properly display* within the context of 
the grammar* the rulename which the predefined rule will 
replace. In other words* this facility allows the language 
implementor to rename the predefined rule for display pur- 
poses . 

When an option has been elected or a TERM node 
predefined rule has produced a bound node* both of which are 
disolayable in their own right* the optional field associ- 
ated with the subtree indicator is no longer necessary and 
will be ignored by the display algorithm. While these nodes 
remain free* however* the optional display field provides 
the user tne information he needs to expand these nodes* as 
well as a logical symbol under which the GDE may place the 
cursor to indicate the current node. 
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A subtree indicator wh i c n may reference one of the 
three node tyoes discussed above must* in order that a valid 
derivation be displayed* include an aporopriate ootional 
display field. The implementor may* of course* omit such a 
display field in which case nothing will be displayed for 
the node. In the case of an undefined nonterminal this mav 
be the most pleasing result; in the case of optionals and 
TERM nodes such a display will not accurately reflect all 
free nodes in the AST that may be of interest to the user. 
The ommission of such an ootional display field may be 
regarded under normal circumstances as a mistake in the 
language definition. 

2 • Rule-Specific Schemas 

Construction of schemas is a s t r a i gh t - f o rwa rd pro- 
cess when keyed to rule-type since the schema subtree indi- 
cators and literal strings must conform to both the R-ARGOT 
grammar rule definition and to the transformation templates 
associated with the rule definition in a consistent way. In 
the schema constructions which follow* format control infor- 
mation is ignored* but generally may be inserted into a 
schema any place that a terminal symbol is allowed, 
a . Concatenat i on 

Rule: 

c ; xl x2 ... xn * xk = { rk } "C"rk"l" { tk > 
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Schema: 



cs : si s2 ... sn , 

"t k" 

*js" 1 ru 1 enamel " 
sk = $ j = " < ru 1 ename> " 

$ j = H ( ru 1 ename ) M 
Sj 



if x k = t k 

if child j is optional 

if child j is predefined 

if child j is undefined 

otherwise 



A sinqle schema is required for the concatena- 
tion rule and may be constructed/ if all nonterminals are 
realized as children in the order they are listed in the R- 
ARGOT rule/ as follows: 

Reading the R-ARGOT concatenation rule from left to 
right/ for each symbol xk: 

if xk is a terminal symbol/ copy it to 
the schema as a literal string; 
if xk is the jth nonterminal and is optional/ 
write $ j =" I ru 1 enamel " to the schema; 
if xk is the jth nonterminal and is predefined/ 
write S j =" < ru 1 ename> " to the schema; 
if xk is the jth nonterminal and is undefined/ 
write S j = " ( ru 1 ename ) " to the schema, 
if xk is the jth nonterminal symbol/ and is 
not optional/ undefined/ or a predefined 
rule/ write $j to the schema; 
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This algorithm for the construction of a con- 
catenation schema is for the displav of the entire valid 
derivation. If display of an undefined nonterminal/ *o r 
example/ is not desired/ the subtree indicator for that 
child could either be written without the optional displav 
field or be omitted entirely, while this algorithm assumes 
that the implementor wrote the concatenation template such 
that the children correspond in order to trie nonterminals in 
the rule/ this need not be the case. The schema must know 
the order/ however/ so that the display is an accurate 
represent at i on of the derivation obtained from the grammar. 

As an example of each of the possibilities 
listed above/ consider the concatenation rule 

simple : "program" name dec 1 s texternsl block "end" . 
where the nonterminal "name" refers to a oredefined func- 
tion/ "decls" is an undefined nonterminal/ and "block" is a 
well defined/ non-oot i ona l / non-p rede f i ned nonterminal. The 
schema for this rule/ without any format control characters/ 
would be 

"program"! 1 = " <name> " = " (dec 1 s ) "$3 = " lex t erns] "Sa"end" 
b. Alternation 

Rule: 

a : char 1 : x 1 "{" char2:x2 "!" ... "!" charn:xn ">" 

Schemas: 

asl : "(alternation rulename>" 

as2 : "( c ha r 1 : ru 1 ename 1 ! ... ! c ha rn : ru 1 enamen >" 
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Since the transformations defined for an alter- 



nation rule are both single node replacements/ the second 
one of which results in the alternation rulename being 
overwritten/ it is clear that no semantic or schematic 
information required in a sentence in the language/ as 
ooposed to a valid derivation in general/ may be associated 
with the schema for an alternation rule since once the 
alternative choice is made by the user/ the rulename and 
thus access to the schema is no longer present in the AST. 
Thus the schema for an alternation rule could have been 
implemented as a subtree indicator optional field. rte 
choose to provide a pair of explicit display schemas associ- 
ated with the alternation rulename/ however/ to implement a 
"help" mechanism. The first display schema consists simply 
of a literal string comoosed of the alternation rulename in 
curly brackets and is the schema normally used to display 
the node. The second/ optional at user request/ is again 
simply a literal string but with the alternative rules and 
their associated keystrokes displayed in curly brackets. 

For example/ the following alternation rule 
statement : { atassignment ! c : c ond i t i ona 1 ! b:block } 

would be displayed normally by the schema 
" { st atement > " 

or/ if the user desired to see the alternatives and their 
keystrokes/ by 

a:assignment { c : cond i t i ona l J biblock >" 
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c . Iteration 

Rule: 

i : " t " r 
Schemas : 

isl : SI 

i s2 : "[iteration rulenameJ" 

The iteration (as well as the list) rules differ 
from concatenation in that they may nave an indefinite 
number of children requiring display. Since no terminals 
are allowed in an R-ARGOT iteration rule and since every 
child is formed i n dependent ly of the others in the sibling 
string* display of an iteration* while involving some work 
on the part of the display algorithm to traverse all of the 
subtrees one at a time* requires a pair of very simple sche- 
mas. The first is simply a subtree indicator used for 
display of all subtrees except the last. The subtree indica- 
tor may include an optional field for undefined and prede- 
fined rule displav* from the transformation template defini- 
tions it is apparent that no child of an iteration node can 
be a concatenation optional node. The second schema is used 
for disolay of the last child* invariably an IQPT node. 

d . List 

Rule: 

1 : rl x "..." * x : ( r2 ! "C"r2"l" ! t > 
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Schemas 



1 s 1 : St 





$ 1 $2 


i f 


x = r2 


1 s 2 : 


$t = " trul ename2J "$2 


i f 


x = ” t 




"t "SI 


i f 


II 

X 



1 s3 : -[list pulenamel " 

The list rule requires three schemas in order to 
property display the unique format the list structure con- 
veys. Like the iteration rule# the list may have an inde- 
finite number of subtrees# however# R-ARGUT allows the 
second argument to be a terminal symbol. Without this 
facility the inclusion of the list rule type is hardly jus- 
tified since the most usual use of the construct is to 
separate grammatical entities with some punctuation mark. 

The first schema is used for display of the 
first child. Subsequent children or pairs of children# 
depending on the specific list rule# up to the last in tne 
sibling string# are displayed by the second schema. The 
display algorithm must keep track of which children it has 
displayed in traversing the list in order that this label 
schema structure display the sequence of subtrees correctly. 
The third schema is used for display of the last child# 
invariably an LOPT node. 

As an example of the list rule schemas# consider 
the R-ARGOT rule 
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statements : # statement " * " ... . 

The schemas generated to display this rule would be 
Isl : $1 
ls 2 : ",*"$1 
1 s3 : "[statements!" 

Note that a NL format control character would be appropriate 
after the " * " terminal in 1 s2 and before the literal string 
in 1 s3 in oraer to place each statement and semicolon pair 
on a separate line. 

e. Predefined 

A predefined display function should accompany 
each predefined rule scanner. The display algorithm will 
pass the subtree created by the predefined scanner to the 
named display function. For example* the predefined scanner 
"id" will scan an identifier* place it in the symbol table* 
and fill in the TERM node with the information allowinq 
reference to that symbol table entry for the evaluator. On 
display* the routine "idout" will be called to cause the 
referenced identifier to be displayed. 

0. THE LANGUAGE DEFINITION MODULE 

The Language Definition Module is the grammatical data- 
base utilized by the Grammar Directed Editor in the con- 
struction and evaluation of an AST. The Language Definition 
Module has a fixed and an interchangeable component. The 
fixed component consists of the system predefined rules and 
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functions. The interchangeable component/ known as the 
Language Definition/ is comprised of the language-specific 
grammar rules/ templates/ and schemas. In addition/ the 
Language Definition may optionally include user-supplied 
predefined rules and functions supplementing or superceding 
those permanently installed in the system. 

1 . The Language Definition 

The primary component of the Language Definition is 
the internal representation of the language-specific grammar 
as an ordered collection of grammar rules and their associ- 
ated templates and schemas. The Language Definition/ apart 
from user-supplied predefined rules and functions/ consists 
of a Rule Tree and a string table. The string table con- 
tains the character string represent at i on of the templates 
and schemas for each rule. The Rule Tree is the ordering 
mechanism for the grammar rules which provides access to the 
templates and schemas in the string table. The Rule Tree is 
a four-tiered hierarchy/ the uppermost level of which is a 
head node for the tree. The next level consists of a 
sequence of head nodes/ one for each defined grammar rule. 
Under each grammar rule node is a pair of head nodes/ the 
first for the templates associated with the rule and the 
second for the schemas. The fourth/ bottom-most tier con- 
sists of leaf nodes containing pointers to the template and 
schema strings stored in the string table. The regularity 
designed into the template and schema definitions for each 
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of the rule types allows accessing any leaf of the Rule Tree 
py the Editor utilizing only the operator and rulename 
information in an AST node label. 

Appendix D is an I n t e r med i a t e-Le ve 1 Language Defini- 
tion Grammar. Encoded by hand into a Language Definition as 
shown in Appendix E/ the ILD Grammar orovides the means to 
generate a Grammar Directed Editor for the construction of 
ASTs representing language-specific Language Definitions. 
When such an AST is evaluated by the predefined function 
ILD/ the result is a language-specific Language Definition 
which may be installed in the Language Definition Module and 
utilized to construct appl icat ions-ori ented ASTs in the 
language defined by the grammar. Appendix F presents a sim- 
ple example of such an app 1 i ca t i ons-or i ent ed Language Defin- 
ition from which ASTs representing strictly formatted 
memoranda may be constructed utilizing the GDE. 

The ILD Grammar allows definition of grammars on an 
assemb 1 y- 1 anguage level/ i.e./ many details which are com- 
putable from the R-ARGOT grammar rule must be entered by the 
user. For example/ in the construction of an iteration rule 
the user is reguired to enter "rulenamel" and "i-pulename" 
in a consistent manner throughout the formation of the tem- 
plates and schemas. However/ at this low level the mechan- 
isms for checking such consistency do not exist. Thus the 
ILD Grammar is seen as a flexible but error-prone tool suit- 
able for use primarily as a bootstrap mechanism for the 
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definition and i mo 1 emen t at i on of a i-tigh-Level Language 
Definition Grammar which automatically derives as much 
information from the R-ARGQT rule as is possible. For aram- 
mars in which all nonterminal children of concatenation 
rules are to be created and disolayea in the order listed in 
the rule? an extended R-ARGQT notation which provided the 
facility for inclusion of format control information and a 
means for soec i f i c a t i on of predefined functions as head 
nodes of concatenations would allow such automatic deriva- 
tion. Development of such an extended notation as well as 
the cor respond i ng HLD Grammar and function are deferred 
until the symbol table and evaluator designs are complete. 

2 . Predefined Rules 

The set of system predefined rules provides the user 
a mechanism for entering strings representing simple# common 
constructs/ such as identifiers and numbers/ as well as more 
involved constructs# such as expressions# which even though 
composed of many oarts and perhaps generating multinode sub- 
trees in the AST# may be most conveniently viewed bv the 
user as representing single logical units. Predefined rules 
are built-in# optional extensions to the Language Definition 
which provide the language implementor with a set of primi- 
tives upon which he may base his grammatical constructs. 
The set of predefined rules is modifiable and extensible by 
the language implementor through inclusion as an adjunct to 
the grammar definition a set of predefined rules which 
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supercede or complement the set permanently installed in the 
Lanquage Definition Module. 

Predefined rules may be viewed as a deviation from 
the grammar directed editing philosoohy esooused throughout 
this work. The use of predefined rules allows the entry, 
after all, of syntactically incorrect strings which are not 
immediately, in the sense of character-at-a-time immediacy, 
detected and rejected as invalid. For example, compare a 
"pure", charac ter-at -a-t i me grammar directed editor with a 
predefined rule augmen ted DDE on the terminal <string>, 
defined for illustration to be the concatenation of any 
characters except a space, and terminated by a carriage 
return. In the pure system, each character is examined and 
its validity checked as it is typed. In this examDle, if 
the user enters a string of valid characters and then a 
space, he is immediately informed that the soace is unac- 
ceptable and is able to proceed without retyping that por- 
tion of the string thus far entered. The oredefined rule 
system, however, would require that the entire string of 
symbols, including the incorrect space, be entered before 
rejecting it, and the user would have to retype the 
corrected string in its entirety. 

rte grant that grammar directed editing down to the 
smallest indivisible unit, the character, has a certain 
appeal. However, our predefined rule compromise is 
motivated by several advantages and mitigating arguments: 
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a* The time lapse between entering even a large 
predefined rule input string# such as a complex expression# 
and re-entering it if it is rejected as incorrect# is short. 

o. The time lost in a predefined rule system in 
retyping the usually short input strings accepted by most 
predefined rules is offset by the time that would be lost in 
a oure system that requires control characters to guide the 
tree building via the language definition through the vari- 
ous alternatives involved in the larger grammatical con- 
structs# such as expressions# that can easily be handled by 
predefined rules. 

c. The syntactic integrity of the AST is always 
preserved by the system predefined rules since no change to 
the AST is made until the syntactic validity of the entire 
input strinq is confirmed. 

d. Predefined rules simplify the language 
implementor's task by raising the level of the lowest gram- 
matical constructs that must be defined in the grammar. 
Instead of having to work clear down to the character level# 
predefined rules provide as primitives the facilities for 
handlinq groups of characters# such as numbers# identifiers# 
and strings# which are the basic building blocks of data 
structures in general and programs in particular. 

e. Given automatic lexical analyzer and parser gen- 
erators# predefined rules for the class of grammatical con- 
structs envisioned are easily built. 
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f. The suitable choice of predefined rules frees 
the language implementor from long-winded/ needlessly 
detailed grammatical constructions for a wide variety of 
regu l a r 1 y -express i b 1 e Droductions. Grammars for language 
definitions# given such a set of easily understandable prim- 
itive constructions# would be more transparent and easier 
for the user to assimilate. 

It is recognized that taking the predefined rule 
approach to its extreme limits could result in a comoiler- 
like editor wherein huge segments are submitted for analysis 
to exceedingly complex predefined rules# thereby negating 
the benefits to be gained from a more rational grammar 
directed editing environment. However# within the guide- 
lines presented here# the predefined rule approach has dis- 
tinct advantages and leaves open avenues for exploration to 
the language implementor. 

3 . Predefined Functions 

Nodes in the AST undergoing evaluation faall into 
one of three categories: undefined# head# and function. The 
class of undefined nodes includes all free nodes which may 
still exist in the AST. Head nodes nodes are the HEAD# ITER 
and LIST operator nodes created for synthesis of the AST# 
all of which are synonymous to the evaluator. Head nodes 
have no computational capabilities during the evaluation 
process but rather provide structure to the AST. Function 
nodes have as their operator one of the predefined 
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functions. Function nodes are generated by concatenation 
and predefined rules during synthesis of the AST and result 
in calls to the corresponding predefined function during 
evaluation. Function nodes may oe leaves* as in nodes which 
reference symbol table entries* or they may be interior 
nodes. If interior* function nodes must have the number, 
order* and type of subtrees expected by the predefined func- 
tion. 

The set of predefined functions defines the range of 
computational power available to the evaluator and thus lim- 
its the capabilities available to the user of the GDE. A 
proposed set of system predefined functions* based on the 
primitives discussed throughout (Pra 1 1 * 1 9751 * is presented 
in Appendix G. This set of system functions may be aug- 
mented by the language implementor through additional or 
superceding function definitions included as extensions in 
the Lanouage Definition. 
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IV. PROGRAMS AS 0ATA6ASES 



A. INTRODUCTION 

The material contained in this chapter was originally 
developed during the search for a solution to a particular 
prob lem: namely/ that of storing the tree representation of 
the synthesized program in secondary storage/ with 
complicated links to other data structures recorded in the 
leaves/ in such a way that pointer and reference integrity 
could be maintained. This problem is aggravated by the 
consideration that such a stored structure might well be 
reloaded at a time when the physical contents of shared 
memory spaces currently in use by the system are auite 
different from the environment existing at the time that the 
tree structure was originally created. 

Once this problem was recognized as being a database 
management problem/ to which known techniques of database 
design were applicable/ the solution was st rai ght forward. 
The database design techniques described throughout this 
chapter are taken from [Kroenke 1977], The relatively 
unorthodox view of programs as complex databases afforded by 
this insight/ however/ is of more general interest since it 
provides a new perspective on the nature of programming 
systems. In particular/ these considerations provide some 
justification for the hope that grammar-dr i ven tree 
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synthesizers are capable of building ud a 1 anguaae- 
independent semantic structure. 

B. PROGRAMS AS COMPLEX RELATIONSHIPS 

In viewing programs as databases/ we first recognize 
that the semantic contents of a program must be accessed by 
two entities: the human reader or writer/ and the processor 
intended to execute the program. Comments excluded/ the 
information available to these two entities is almost 
identical: that is/ the human user can predict exactly the 
operation of the processor for a given program/ and the 
processor deterministically executes the encoded intentions 
of the programmer. So without loss of generality/ we may 
initially consider the program as a database accessed by the 
processor. In the case of a machine language program/ the 
processor is the real machine on which the program is to 
execute. For a higher-level language/ the orocessor is the 
h a rdware-so f t war e combination/ or virtual machine/ which is 
capable of translating and executing the program. 

The "semantic content" of the program is the collection 
of potential evaluations which the processor may be required 
to perform throughout the course of execution. For the 
moment/ we disregard the order of execution. Each 
evaluation consists of the selection of one of many 
primitive operations which the orocessor is capable of 
performing/ and the application of that chosen primitive 



operation to a number of arguments# contained in one or more 
registers# or memory locations addressable in some way. 

Upon reflection# it is clear that ooth the set of 
primitive operations and the set of addressable memory 
locations are databases in their own right. The keyname# or 
code by which an entry can be uniquely located# for the set 
of primitive operations is the operation name# or opcode# 
and that for the collection of potential arguments is the 
address . 

Clearly# the set of potential evaluations is# in the 
terminology of database theory# a complex relationship 
between primitive operations and registers. A given 
operation may be applied to many different sets of arguments 
within the course of a program execution# and a given 
register may be the argument for a number of different 
operations. There is no functional relationship between 
items of the two databases in either direction# which means 
that neither keyname can be used to uniquely identify an 
item in the complex relationship between them. 

C. DECOMPOSITION OF THE EVALUATION RELATION 

Standard database design techniques specify several ways 
by which each of the elements of a complex relationship 
between two databases can be referred to in a systematic and 
unambiguous way during database access. Two general methods 
of approach are used. One is to ( a rb i t r a r i 1 y ) force the 
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relationshio to be simole (many-to-one in one direction 
only) > by rejecting from the allowed range of possibilities 
any nemoeps of the relationship which would cause the 
relationshio to be compl ex. In this case * the keyname for 
one of the underlying databases can be used to unambiguously 
refer to members of the relationship as well. The second 
method is to decompose the relationship into two simole 
relationships by constructing an intersection database. 

There exist programming systems in which the first 
strategy is adopted. For instance^ if the restriction is 
made that registers may not be re-used, so that at most one, 
and only one, primitive operation is applied to a given 
register, a purely functional, or no-as s i qnment programming 
system is obtained. In such a system, the only named 
semantic elements are functions and constants (which may be 
regarded as functions). Registers need not be named since 
whenever one is needed, it can be drawn from a pool, used 
once, and discarded by the processor. 

This approach is considered mathematically elegant, but 
it is not much in use in non-academic programming systems. 

In the second approach, an intersection dataoase is 
created, consisting of one entry for each distinct memoer of 
the complex relationship. As a minimum, in order to allow 
reference to the generating databases, each entry in the 
intersection database must contain the keynames for those 
entries in the original data sets with which it is 
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associated. Thus, for a programming notation, each entry in 
the intersection database must contain, at a minimum, an 
opcode and a register address for each argument, in some 
f orm . 

The archetypical entry for the intersection database 
cor respond i ng to the evaluation relationship is thus: 

OPCODE ADDRESS ( 1 ) ADDRESS ( 2 ) . . . ADDRESS ( N ) 

This format is recognizable as the atomic unit of notation 
for most common programming systems, from machine code to 
high level languages. Each single such entry corresponds to 
what is normally referred to as an instruction. In summary, 
we assert that a program is nothing more than the 
intersection database for instances of the evaluation of 
accessable operands by the primitive operations available to 
the evaluating processor. 

D. CONTROL STRUCTURE 

i/Ye have heretofore ignored the guestion of how the order 
of execution of the evaluations is to be specified within 
the program (the basic elements of which are now seen to be 
entries in an intersection database). This order 
corresponds to the logical access sequence of the set of 
instructions. Thus, we may equate the ordinary notion of 
the control structure of a program, to the database-oriented 
notion of a logical access structure for the program 
database. The simplest access mechanism for a database is 
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to order it as a simple sequence. Under this protocol/ the 
elements of the database will be oresented to the accessing 
entity in a strictly invariant sequence. 

Such an accessing structure is realized in such simple 
proaramming systems as that of a key s t ro ke-p roqr ammab 1 e 
calculator. A sequence of keystrokes can oe entered and 
automatically reproduced at will/ but there is no 
possibility of automated branchinq. 

Such programming systems are fundamentally limited in 
mathematical computational power. The simplest modification 
to such an access regime is to allow conditional branching/ 
so that a part of the instruction sequence may be reoeated 
or skipped/ based on the contents of a register at tne time 
the oranch is reached. 

Machine and as semb 1 y- 1 e ve 1 programming systems/ as well 
as such high-level languages as BASIC and FORTRAN/ are 
organized on such a plan. 

E. STRUCTURED PROGRAMMING SYSTEMS 

The disadvantage of a sequential access mechanism is 
that the resultinq database does not have local integrity. 
Instruction sequences which may be logically adjacent under 
certain circumstances are not necessarily physically 
adjacent. This access organization presents no real 
disadvantages for the machine processor with a random-access 
architecture/ but can be quite confusing for the human 
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programmer. To render the program dataDase more accessible 
to the user, the notion of structured programming was 
developed. This oraan i zat i ona 1 technique consists of 
organizing the access of a orogram database in a 
hierarchical (tree-like) manner, so that Drogram control 
follows a hierarchical program structure which can be 
expressed as a string generated by a context-free grammar 
(and thus has an associated ohysically hierarchical 
structure induced by the grammar). Such program control 
facilities as functions and subroutines were the earliest 
"structured constructs". The syntax of such languages as 
PASCAL and ALGOL, however, were consciously designed to 
facilitate the expression of a hierarchical control 
structure, and make the expression of a disordered, 
sequential control structure less attractive than the use of 
"structured" control operators. It is this historical 
development which encourages us to hope that a language- 
independent semantic tree structure may be built using a 
grammar-driven tree editor. Basically, we note that it has 
become a conscious design principle in the development of 
structured programming languages, to ensure that program 
control flow follows the syntactic organization of the 
language. The underlying set of primitive operators have a 
great deal in common. Language-dependent primitives can be 
added to the set available to the processor and evaluated 
without regard to the specific syntax by which they are 
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expressed* provided that the overall control structure of 
such additional primitives is also h i erarch i ca I 1 y organized. 

F. PHYSICAL REPRESENTATION OF A TREE-STRUCTURED PROGRAM 

■fie are left with the problem of physically representing 
a t ree-st rue t ured program in a sequentially organized 
physical memory space. The problems encountered are 
precisely those encountered when attempting to implement any 
h i e rare h i ca 1 1 y organized intersection set. They stem from 
the requirement to refer/ directly or indirectly/ to the 
entries in the parent databases from more than one place in 
the intersection database. Two general strategies/ each 
with its own advantages and di sadvant eges / are currently in 
use in database management systems. 

1 . Sequent i a 1 Tree Representation 

This strategy is implemented by representing the 
tree as a linear list of nodes and their contents in 
preorder sequence. References to the parent databases are 
embedded in the listing by keyname. The complexity of the 
relationship implies that each such keyname must be repeated 
many times throughout the list. Special delimiters are used 
between node listings to indicate whether the next node is a 
child/ sibling/ or uncle of the last. If one of the 
keynames is to be changed/ a search of the listing must be 
made to find all of its occurrences. A second major 
disadvantage is that in order to access any part of the 
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list* the list must De traversed sequentially from the 
Beginning. On the other hand, no pointers need occur 
anywhere in the list/ so that it can be moved about freely 
from one place to another without change. 

2 . Linked Representation of Trees 

Trees are represented in this strategy by nodes 
linked together using pointer fields within each noae. A 
pointer is either the absolute address of the entity pointed 
to/ or an offset or array suoscript which can be used by 
routines in the system to calculate such an address. Tne 
salient feature of a pointer reference is that it allows 
reference by some mechanism which is independent of the 
value of the referenced entity. Thus/ the value of the 
entity itself can be changed without changing all of the 
references to it/ which are still valid (provided/ of 
course/ that the chanqe is made without physically moving 
the chanqed record.) When the tree itself is represented by 
means of nodes linked with pointers/ it is common to link 
the leaves of the tree to the parent databases with pointers 
as well. It is assumed that a means exists to distinguish 
such external links from the internal links defining tne 
tree structure itself. This representation has as one major 
advantage the ability to be guickly traversed (bv following 
pointers). Another major advantaae of this strategy is that 
information in referenced databases need only be recorded 
once/ and can be changed without updating any pointers. 



Deletion of information is somewhat more a i f f i cu 1 t * but can 
be accomplished by constructing ana maintaining cross- 
reference lists (inverted lists) which contain pointer 
references to all nodes in the tree referring to a qiven 
record in the parent database. The primary disadvantage of 
such a represent at i on is that the structure cannot be moved 
or stored without a great deal of pointer modification. The 
use of relative pointers is an inadequate solution* since 
the consistency of references to the Parent databases* wnich 
need to be moved and managed as separate entities* must 
still be maintained. 

3 . A Hybrid Strategy for Tree Representation 

An examination of these characteristics indicates 
that the linked represent at i on is preferable when changes 
are to be made to either the parent or tree databases* but 
that the sequential representation is preferable when the 
database is to be transmitted from one location to another* 
or stored unchanged for a relatively long period of time. 
(Storage is equivalent to transmission from one time to 
another* and is thus loqically the same problem as that of 
movement . ) 

we conclude that the linked representation is an 
appropriate representation for the program tree aurinq 
synthesis and evaluation* but that the program tree should 
be moved (or stored on secondary storage) in sequential* 
pointer-free format. Links to the parent databases are 
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converted from pointer references to reference by keyname. 
The next section addresses the problem of how conversion 
between the two representations can# in general terms* be 
accomp 1 i shed . 

G. PROCEDURAL REPRESENTATION OF DATA 

In order to incorporate these ideas into a feasible 
design* we consider the facilities that would have to exist 
in such a system. Since the program tree is to be operated 
on in main memory with a linked representation, we may 
assume that a data manipulation package exists which is 
capable of synthesizing and maintaining all of the pointers 
required to keep the linked structures coherent and 
consistent. Consider the process of removing a sequentially 
organized tree structure from secondary storage and loading 
it into internal memory. This process must consist of 
ordering a particular series of function activations with 
particular arguments from the data manipulation package* 
causing the desired structure to be built within physical 
memory. The sequential representation is seen to be nothing 
but a program for the data manipulation package, which is 
itself a processor with a number of primitive operations. 

Moreover* a strictly sequential control protocol for 
this program is possible* given a reasonably powerful set of 
primitives in the data manipulation packaqe* since a tree 
can be synthesized in strict pre-order sequence (the parent 
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for each child exists at tne time of the child's synthesis.) 

rte conclude that the appropriate secondary 
representation for a program tree is as a sequential list of 
i nst rue t i ons # to be translated by some simple interpreter 
into a series of calls to the data manipulation packaae. 

The offload# or transmit process, consists of a pre- 
order traversal of the linked representation, emitting the 
appropriate instructions for recreating the skeleton of the 
tree and filling in the contents of each node as it is 
reached. At the same time, references can be removed from 
the appropriate cross-ref erence lists# triggering removal of 
the data item from the parent database when a reference 
count of zero is reached. Durinq onload# the skeletal 
structure of the tree is recreated# and external references 
in symbolic form reloaded into the appropriate parent 
database. Pointer and cross-reference list creation and 
maintenance is performed automatically by the pre-existing 
data manipulation packaqe. 

The secondary representation can thus be viewed either 
as data# representing the tree in linear format# or as a 
program for the data structure manipulation package which 
will cause a logically equivalent tree to oe reconst ructed 
in available memory. 

As a beneficial side effect# if the capability is 
installed to allow the onload and offload translators to 
read to or from strings in main memory# the described system 



provides an easy way to copy or move subtrees, as well as to 
encode tree-building templates efficiently. In fact, the 
prooosed mechanism becomes the method of choice for any and 
all movement of tree structures from one location or time to 
another, since the data in the transmitted stream is 
entirely logical, containing no reference to any 
implementation details. The Drocess would even allow 
internal representations to be transmitted from one 
installation to another with a completely different 
i mpl ement a t i on , since all implementation-dependent data is 
removed during the offload process and reinserted during the 
onload process. 

H. SUMMARY 

In this section we have viewed orograms as specialized 
databases, and have found that standard database models 
correspond nicely to various programming language styles. 
Two fundamental conclusions have been reached. The first is 
that it seems very likely that gr amma r-d r i ven tree editors 
can be used to produce trees representing the control 
structures for common programming languages in a syntax- 
independent, directly-evaluable format. This hooe is based 
on the direct expression of hierarchical control structures 
by the syntactic hierarchy implicit in the defining grammars 
of current programming languages, and the recognition that a 
small set of such control structures provides the common 



case for current language design. 

The second result is the solution to a technical 
problem: that the aporooriate format for such program trees 
is in linked form when the tree is undergoing modification, 
and as a sequential, Drocedural, pointer-free list of 
instructions when the tree is being stored, or transmitted 
from one point to another. 
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V 



A PROTOTYPE SYSTEM DESIGN 



In this section/ the design for a prototype system 

demonstrating the feasibility of the ideas developed in 
previous chapters is described. Since the implementation of 
the described system is/ at present/ incomplete/ the design 
is presented only in broad outline. A full description of 
the demonstration prototype will be provided as a Technical 
Report when the initial implementation is complete. 

The approach taken is to first describe a complete system 
for a grammar-dr i ven / language independent programming 
environment/ and then select a subsystem for i mp 1 emen t a t i on 
as a prototype feasibility study. The prototype subsystem 
will be used to generate statistics concerning memory size 
and computational efficiency/ as well as to refine the user 
interface/ with the possibility remaining of extending the 
prototype to a more complete implementation at a future 
time. 

A basic block diagram of the complete system is provided as 
Figure 5. 

A. SYSTEM MODULES. 

The proposed system consists of the following modules: 
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1. Data Structure Support Module 



This module contains packages of functions; each 
package implementing a specific abstract data type neeaed by 
the remainder of the system. At a minimum/ the abstract 
data type packages needed include one supporting an 
indefinite number of indefinitely large association lists, 
(to represent the contents of tree nodes), and one 
supporting general ordered trees, optimized toward 
reasonably efficient traversal in all directions. In 
addition, the tree support package must include a facility 
for linking the leaves of trees to other data items, such as 
strings, symbol table entries, numerical contents, and so 
on. Each tree node (internal as well as leaf) must be 
linkable to an association list representing the contents of 
the node. 

In addition to supporting tree and association list 
data types, this module is reSDonsible for supporting any 
additional data types for which the need arises and which 
are not supported directly by the language used for 
implementation. (In particular, the implementation 
currently being developed requires a very primitive string 
table which serves as a rudimentary symbol table.) 

2 . Grammar-Dr i yen Environment Module. 

This module provides an editor-like interface for 
the user. It translates user commands into appropriate 
system actions, which include editing functions, directives 
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to evaluate a oarticular program structure/ and movement of 
Abstract Syntax Trees from secondary to primary storaqe and 
oac k again. A major component of this module is the 
gramma r*dr i ven synthesizer itself. 

3 . Memory Management Module. 

This module comprises the actual system orimary 
memory itself* which is used to store the LD (Language 
Description) and AST (Abstract Syntax Tree) currently in 
use. In addition* the primary memory module contains the 
data structures being manipulated by the Data Structure 
Support Module. 

4 . File Management Module. 

This module implements a single-user workspace on 
secondary storage which contains all of the LD ' s available 
to the user* as well as all of the AST's which may have been 
previously created and saved. These components are stored 
in sequential* Dointer-free format as discussed in Chapter 
IV. 

5 . Input/Qutout Manifolds. 

These modules manage the system input and outout 
streams* which may be redirected as required by components 
of the system (including the user) to various physical 
devices. The inout stream may be taken from the keyboard* a 
file on secondary storaqe* or a string in primary storage. 
This assignment may be changed dynamically during the 
operation of the system. Similarly* the output stream may 
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de dynamically directed to tne CRT* a string in orimary 
storage* or to a file on secondary storage. (The term 
"manifold" is used to suggest that these functions may de 
thought of as three-position switches* the setting of which 
may be changed at will during system operation.) 

6 . Onload and Offload Translators . 

These modules* controlling the Data Structure 
Supoort facilities* convert the seguential data 
representat ions stored on secondary storage to the linked 
represent at i on needed when an LD or AST is loaded into 
primary memory* and vice versa. As a secondary feature* 
since the input and output streams may originate or be 
directed to internal strings* these modules can de used to 
"quote" or "unquote" tree structures* as when a template is 
translated into an actual subtree replacement. 

8. PRE-EXISTING MODULES . 

The current implementation is being made using the C 
Programming Language on a PDP-11 with the UNIX Operating 
System. (UNIX is a trademark held by Bell Labo rat o r i es * 
Inc.) This software combination provides a C-accessible 
interface to memory and file management facilities. In 
addition* a complete library of string handling and 
input/output functions is available. In consequence* the 
memory and file management modules described above may oe 
thought of as already in existence* for the ourpose of 



120 



describing the prototype subsystem. In addition, keyboard 
and CRT interfaces are already operational: under the UNIX 
ooerating system, hardware interfaces are mapped into the 
system as files with conversion routines provided 
t ransparent 1 y . Thus, for the Input/Output Manifold module 
we need only provide a means of diverting the input and 
output streams from one file to another, or to main memory. 

C. SUBSYSTEM SELECTION. 

Given the broad outline of system module function 
provided above, a minimally capable prototype subsystem can 
be selected for initial implementation. Such a subsystem 
must be capable of initialization, synthesis, display and 
storage of an AST in order to demonstrate convincingly the 
feasibility of the concepts outlined in previous chapters. 
Facilities to evaluate (execute), revise, and debug 
previously entered AST's may be deferred, as may the 
facility to easily install a new Languaqe Definition. 
Therefore, the capabilities provided by each of the modules 
in the prototype subsystem may be redefined as follows: 

l . Data Structure Support Module . 

Full packages supporting general ordered trees and 
association lists are needed. In addition, a primitive 
capability to store and reference string values is needed. 
The capability to support sophisticated symbol table 
structures may be deferred to such time as semantic 
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information is needed to allow execution of AST structures 



2 . G r amma r-D r i yen Environment Module . 

The only major capability required by the prototype 
subsystem is the "append" function, which can be used to 
create AST structures. In addition, a working display 
mechanism with simple cursor control facilities is needed. 
A f rame-ori ented display mode is satisfactory for the 
prototype system (although eventually a screen-oriented 
display driver would be desirable). Finally, facilities for 
storing and retrieving AST's to and from secondary storage 
as well as a facility (however cumbersome) for installing 
new language definitions is needed. 

3 . Input/Qutout Manifolds. 

These modules need to be implemented in full, in 
order that secondary storage may be used, and in order to 
allow templates existing in primary memory to appear in the 
input stream for processing by the Onload translator. 

4 . Onload and Offload Translators. 

These components also must be fully implemented for 
the same reason as the Input/Output Manifolds. The 
implementation must be flexible enough so that as more 
sophisticated data structure packages are added, the 
sequential representation can syntax can be extended to 
accomodate onload and offload of keyfields in the new 
st ructures . 
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5 . Bootstrap Procedure 



The system can be initialized as follows. ae 
currently regard Language Definitions as being written in 
one of three languages* or notational systems: a high-level 
format (which is to consist of R-ARGQT notation with display 
and semantic specification extensions)* i n t e rmed i a t e- 1 eve 1 * 
(the notation developed in Chapter III)* and low-level* (the 
seguen t i a 1 i 2ed * pointer-free represent at i on of an internal 
tree corresponding to the desired LD* using the language 
alluded to in Chapter IV.). 

There is no fundamental difference between the 
intermediate and low-level formats* since they represent two 
alternative represent at i ons for the same database. 
Translation from one format to the other is performed 
automatically by the onload and offload translators when 
this database is moved to and from secondary storage. 

In order to bootstrap the system* once all of the 
modules have been compiled and linked* it is necessary only 
to perform the Job of manually translating an intermediate- 
level description of the intermediate-level language to the 
cor respondi ng low-level description* and install the 
resulting text as a file accesible to the system using a 
conventional editor. 

At this point* the system facilities can be actuated 
to load the file as a language description into system 
primary memory. During the load* the onload translator will 
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convert the description into a linked representation of the 



database needed to describe and guide the synthesis 
language descriptions in the intermediate format, 
the system itself can now be useo to create/ as a 
driven editor/ additional language desc r i pt i ons . 



of new 
That i s / 
grammar* 



VI 



SUMMARY 



A. CONCLUSIONS. 

In the preceding chapters# a conceptual foundation for 
the interactive creation of databases# structured 
h i er a rc h i ca 1 I y according to a given context-free grammar# 
has been provided. The primary conclusions supported by 
this work are: 

1. A basic model for the described process is that of a 
valid sentential form generator# rendered determinate by 
allowinq for the interactive selection of which production 
to apply and at which point in the a 1 r eady-de r i ved structure 
the selected substitution is to be made. 

2 . Notations exist Ce.g. the R-ARGOI notation) for the 
specification of general# context-free grammars which are 
both human-oriented and directly i nt e rpret ab 1 e as the 
knowledge base for such a system. 

3. The basic mechanism correctly interprets ambiguous 
or incomplete grammars# as well as allowing for the 
synthesis of correctly labeled incomplete derivations. 

4. Analogous mechanisms can be described which derive 
and display not strings# but derivation trees which are 
morphisms of validly derived strings under the specified 
grammar . 
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5. The grammatical notation can be transformed into 
cont ex t - i ndependen t oDeration codes with arquments which can 
be stored in the leaf nodes of the derived tree in such a 
way that subsequent synthesis proceeds correctly* and 
subtree deletion can be efficiently and consistently 
performed without examination of the surroundinq context in 
the tree. 

6. The resulting derivation trees can be used to encode 
semantic information in such a way that the trees can be 
evaluated correctly without further reference to the 
syntactic* as opposed to physical* structure of the tree. 
(This assertion is a speculation* not a firm conclusion.) 

7. A method exists for storing such structures in such 
a way that their consistency does not depend on any external 
data structures save the language definition itself. 

B. WORK IN PROGRESS 

Implementation of the prototype subsystem is currently 
in progress* with no difficulties currently foreseen. The 
only module awaiting final coding and test is the Grammar- 
Driven Environment module itself* and the algorithmic 
soec i f i cat i on of the functions needed has already been 
accomplished. Provided that no further difficulties are 
encountered* a complete description of the prototype 
subsystem will be later provided as a Naval Postgraduate 
School Technical Report. 
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The prototype suDsystem code is oriented toward a 
demonstration of technical feasibility as opposed to storage 
or execution time efficiency. However, it has been written 
in a h i gh 1 y-modu 1 ar i zed manner# so that after 
instrumentation and performance measurements appropriate 
modifications can be made fairly easily. An attempt has 
been made to provide for the extension of the prototype 

system to a more complete realization of the original system 

des i gn . 

C. FUTURE RESEARCH DIRECTIONS. 

After completion of the prototype subsystem# two 
directions are indicated for future investigation. 

1 . Extension of the Prototype Subsystem. 

a. Symbol Table Implementation. 

A generalized symbol table data type must be 

defined which will adequately support a wide range of 

programming languages. 

b. Semantic Action Implementation. 

A class of primitive operations (including 
access facilities to the defined symbol table structure) 
must be formulated# provision made for 1 anquage- i mo 1 emen t e r 
definition of additional primitives# and an AST interpreter 
written. 
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c. Pattern-Matchi nq. 

A pattern-matching facility should oe provided 
as part of the user interface as a sophisticated means of 
cursor control. A fairly simple pat t ern-ma t c h i nq 
capability* when combined with the pre-existing capability 
to access the AST in a syntax-oriented wav* would allow the 
user to search and access the structure in very 
sophisticated ways? e.g. such commands as "find the next 
occurrence of an assignment to identifier a" could easily be 
formulated. Moreover* when combined with a relatively 
st ra i gh t f orward debug facility* (for example* setting of 
break-points) a very high-level program test facility could 
oe provided. 

d. High Level Language Descriptions. 

The high-level format for both syntactic and 
semantic language specification should be formulated and 
implemented as a more convenient means for irndementing new 
1 anguages . 

e. Debugging Tools. 

Provisions should be made to allow the user to 
set breakpoints* access the current data environment* and 
order steo-by-steo execution modes from the editor. 

f. Dynamic Language Changes. 

The feasibility of allowing language changes to 
be made dynamically during AST creation or execution at 
points specifiable in the language aefinition should be 
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investigated. Related to this oroblem is the provision of a 
facility to link (perhaps dynamically) one AST to another. 

g. Increased Storaqe Efficiency. 

Once basic design parameters* now indefinite* 
(Such as number of primitive operations) are made final* the 
desirability of packing data fields into AST nodes rather 
than using the spac e- i ne f f i c i en t association list 

implementation* and the resulting impact on time-efficiency* 
should be studied. 

h. Full User Interface. 

Oeferred edit functions* such as delete and 
insert* should be installed in the Grammar-Dr i ven 
Environment Module. 

2 • Additional Applications for the Technology. 

The conceptual framework oroviaed by this Paper is 
sufficiently general to support unexpected apnlications in 
areas quite distant from the field of programming 

environment design. A few such aoplications are suggested 
below: 

a. Generalized Editing. 

Generalized editors* as described in (Fraser 
1900]* are editors which provide for the manipulation and 
display of data structures other than text files. The 
mechanism is well-suited for the direct editing of a 
h i er a rc h i c a 1 1 y organized database of any type. 
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0. Sparse Programmi na Languages. 

Current programming languages are designed with 
a oarser-based implementations as a fundamental assumption. 
For that reason/ they typically include many keyword and 
punctuation symbols which are irritating* because 
superfluous* to human users. Because the described 
technology can utilize ambiguous grammars* soarse languages 
with the minimum amount of ounctuation needed for human 
comprehensibility can be described which could be 
implemented using grammar-dr i ven synthesis as the 
fundamental input mechanism. In fact* improved oerformance 
from the synthesizer could be expected for such a "pseudo* 
code"-like language* since the inherent semantic density of 
the derivation tree could be made very high. 

c. Artificial Intelligence Applications. 

In the described design* considerable pains have 
been taken to provide a simple* uniform method for grammar 
rule and point of application selection* suitable for use by 
a human operator. There is no fundamental reason why very 
complicated heuristic methods could not be used* however, to 
select the rule to be applied and the place in the current 
structure the application is to be made. For instance* a 
production system (in the Artificial Intelligence sense) 
could be used to perform this function. The resulting 
hybrid system would have a heuristic front end* and an 
algorithmic back end* with the desirable property that 
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whatever structure the heuristic front end attempted to 



build/ the resulting structure would alwavs De guaranteed to 
be correct in terms of the "deep structure” soec i f i ed by the 
language description. Attempts bv the heuristic module to 
perform inconsistent modifications would be detected/ 
prevented/ and reported by the synthesis module. A 
knowledge representation based on such a system would be 
able to interact with the user in very irregular/ and 
occasionally incorrect/ ways/ while preserve a fundamental 
internal database with guaranteed consistency. 
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APPENDIX A. NOTATIONAL STSTEMS FUR CUN F £ X T -F RE L GRAMMARS 
1. BACKUS-NAUR FORMAT (in R-ARGUT) 
context-free-grammar: t production . 

production: non-terminal " l r i gh t -h ana-s i ae ) **." . 

r i gh t -hand-s i de : t construct . 

construct: i terminal ! non-terminal ^ . 

non-t erm i na 1 : *<" string ">" . 

termi nal : *st ri ng" . 

we assume that "string" is a sequence of any appropriate 
character set not including the metasymbols. 

Note tnat this notation is in itself a regular language. 

2. ARGOT NOTATION (in R-ARGOT) 

ARGO T : t rule . 

rule: rule-name concatenation. 

concatenation: +suo-express i on . 

sup-ex press i on : { opt i onal -i terat i on 

5 simol e-i terat i on 
1 list-iteration 
{ option 
i al ternat i on 
I opt i onal -a l terat i on 
5 rule-name 
J terminal 
! group 
> . 

oo t i ona 1 - i t e r a t i on : sub-expression . 

simple-iteration: suo-expression . 

list-iteration: *U" sub-expression sub-expression ". . . " . 

option: " (" concatenation "]'* . 
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alternation: "{" concatenation "!" alternatives "} 

opt i ona 1 -a l t e rna t i ve : " ( " cone at ena t i on "1" alternat 
alternatives: n concatenation H J ** ... 
group: "(" concatenation " ) " . 

terminal : " " " s t r i ng " " " . 
rule-name: string. 

("string" is taken to be a predefined rule.) 

3. R-ARGOT (in R-ARGOT) 

R-ARGOT: + rule * 

rule: rule-name expression . 

expression: { concatenation 

I iteration 
! 1 i st-i terat i on 

} a 1 ternat i on 
> . 

concatenation: ♦field . 

iteration: "+" rule-name . 

list-iteration: rule-name field 

alternation: rule-name "1" alternatives ">" . 

alternative: U rule-name "J" ... . 

field: ( rule-name 
! option 
! terminal 
> . 

option: "(" rule-name "] " . 
terminal: " " " string " " " . 
rule-name: string . 

Note that this notation is* in itself/ 

1 anguage . 



i v e s " 1 " . 



r egu 1 a r 
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APPENDIX 3. A GRAMMAR FOR PASCAL 
IN R-ARGOT 



PASCAL: "program" identifier "( M name-list ")" ";" 

o 1 oc k " . " . 

clock: ( lacels 3 t constants 3 l tvpes 3 I variables 3 

I subroutines 1 "begin" statements "end" 

labels: "label" integers . 

constants: "constant" c-decls . 

tvpes: "type" t-decls . 

variables: "var" v-decls "?" . 

subroutines: ♦ s-decl . 

integers: tinteger . 

c-decls: V c-decl ... 

c-decl: identifier " = " constant . 

t-decls: u t-decl ... 

t-decl: identifier " = " type . 

v-decl3: # v-decl ... . 

v-decl; name-list type . 

name-list: # identifier ... 

s-dec 1 : { p-dec 1 
} f-decl 
> . 

p-decl: "procedure" identifier ( parameters 3 
b 1 oc < " J " . 

f-decl: "function" identifier [parameters! identifier";" 

block . 

parameters: "(" param-list . 

param-list: » param-sec t i on "?" ... 
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pa ram-sec t i on : < f-params 

! v-params 
! p-params 
! c-params 
> . 

f-params: "function 14 name-list identifier . 

v-params: "var" name-list " : " identifier . 

p-params: "procedure" name-list . 

c-params: name-list identifier . 

type: { scalar-type 

! subrange-type 
J pointer-type 
J set-type 
i array-type 
! record-type 
{ f i 1 e-type 
! i dent i f i er 
> . 

scalar-tyoe: "(" name-list " ) " . 

suorange-t ype : constant constant . 

pointer-type: "t" identifier . 

set-type: l packed l "set" "of" simple-type . 

array-type: IpackedJ "array" "I" subscripts "l" "of" 

record-type: ( packed j "record" [ field-list J "end 

file-type: l oacxed i "file" "of" type . 

packed: "packed" . 

simple-type: < identifier 
i scalar-type 
! subrange-tyoe 

> . 

field-list: < var-fields 

1 mi xed-f i e 1 ds 

} . 

mixed-fields: fixed-fields ( and-var-f i e 1 ds J . 



type . 
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and-var-fields: " ? " var-f ields . 

fixed-fields: * fixed-field " i " ... 

fixed-field: name-list ":" type . 

var-fields: "case" [ tag l identifier "of" 

variants . 

variants: # variant . . . 

variant: cons t an t - 1 i s t ":" "(" l field-list J ")" . 

constant-list: # constant ... 

statements: a statement ... 

statement: l integer ] [ action } . 

action: { assignment 

i procedure-ca l l 
{ compound 
l i f -st at ement 
i repeat 
i while 
i for 

1 case-statement 
I goto 
1 with 
> . 

assignment: variable "=" expression . 

procedure-call: identifier ( arguments 1 . 

arguments: "(" arglist ")" . 

arglist: # argument ... 

argument: { identifier 
1 expression 
> . 

compound: "begin" 

s t a t ement s 
"end" . 

i f-statement : "if" expression "then" 

statement 
l e 1 se-oar t J . 
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e 1 se-oart : "else” 

statement . 



repeat : " repeat " 

statement s 

"until" expression . 

while: "while" expression "do" 

s t a t ement . 

for: "for" identifier " exoression t-or-d expression 

statement . 



t-or-d: < downto 

I 

I 

> . 

downto: "downto" . 
to : "to" . 



t o 



case-statement: "case" expression "of" 

cases 

•end" . 



cases 


: # case 


It • H 

t • • • 


• 


case : 


constant-1 i st " : '* 


s t a t emen t . 


with: 


"w i t h " 


variables 


"ao H 




statement . 




goto : 


"goto" 


integer . 




variables: # 


variable " 


it 

t • • • • 


expression; { 


1 t 





: 1 te 

i ea 
! qt e 
i gt 
! neq 
I i n 

J s-expression 

> . 

It: s-expression "<" s-expression. 
Ite: s-expression "<=" s-expression. 
eq: s-expression "=" s-expression. 



"do" 
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s-expressi on 



gte: s-exoression ’> = " 

gt : s-expressi on ">" s*exopess i on. 

neq: 3-expression "o'* s-exoression. 

in: s-expression "in" 3-expression. 

s-express i on : t sign ) u-express i on. 

sign: { plus-sign 

• mtnus-siqn 

> . 

plus-siqn: " t " . 

minus-sign: . 

u-express i on : { plus 

{ minus 
i or 
I term 

) . 

plus: term " + " term . 

minus: term term . 

or: term "or" term . 

term: { t i mes 

! quot 
i di v 
1 mod 
! and 
i factor 
> . 



times: factor factor . 

quot: factor factor . 



u i v : 


factor 


"di v" 


factor . 


mod : 


factor 


"mod" 


factor . 


and : 


factor 


"and" 


factor . 
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factor: { grouo 

i not 
i set 
} v-or-c 
> . 

group: " ( " exoression ")" . 

not : "not " factor . 

set: "l" ( set-members 1 "J " . 

set -members : # set-member . . . 

set-member: { range 

« expression 

> . 



range : 
v-or-c 



expression " . . " expression . 

{ unsigned-constant 
} variable 

> . 



variable: identifier [ modifiers 1 . 

modifiers: ♦ modifier . 

modifier: { subscript 

! f i e 1 d-reference 
! i nd i rec t i on 

> . 



subscript: "l" exoressions " 1 " . 
f i e 1 d-ref erence : identifier . 

i ndi re ct ion: "T" . 

expressions: * exoression " , " ... . 

It is assumed that predefined input scanners exist for 
the rule names "integer", "identifier", "constant", and 
"unsigned-constant". 
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APPENDIX c: TRANSFORMATION TEMPLATE GRAMMAR 



The following grammar defines symbol strings which are 
interpreted as calls to tree-building and node-modi fy i nq 
routines whose existence is assumed# as is the interpreter 
which makes those calls. Also implicit in the following de- 
finitions and discussion is the notion of a "current node", 
defined for the purpose of the application of templates to 
be any free node in an AST. 



tempi ate: 


{ subt ree { s i bl i st > . 




subt ree : 


boundnode Ichildlist) . 




ch i 1 dl i st : 


"(" s i bl i st ")" . 




s i b l i s t : 


# freenode "* H ... . 




boundnode : 


boundoo rulefield . 




f reenode: 


f reeop rul ef i el d . 




rul ef i el d: 


" t " rul ename . 




boundop : 


< HEAD J ITER 1 LIST i pdf 


> . 


pdf: 


{ (predefined functions) > 


• 


f reeop : 


{ NT | ALT ! COPT i IOPT J 


LOPT J TERM ) 


rul ename : 


( (grammar rulenames) 





I 

I 



(predefined rulenames) > 



The Template Grammar produces operator and rulename 



pairs» both bound and free, punctuated by the terminal sym- 
bols " ( " / and ")" which are interpreted as follows: 



Create a child node under the current node/ maxe 
the node created the current node/ and overwrite the OP 
field with the operator listed next* 

"/": Create a right sibling of the current node/ make 
the node created the current node/ and overwrite the OP 
field with the operator listed next. 

"/": Overwrite the RULE field of the current node with 
the rulename listed next. 

Make the father of the current node the new 
current node. 



The first symbol of every template is an operator/ ei- 
ther free or bound/ which overwrites the OP field of the 
current node. The current node is the only node in the AST 
which is modified in any wav by a template/ new nodes may 
be created/ but always within the context of the current 
node . 

The templates defined by this grammar allow definition 
of the t ransf ormat i ons in Chapter III. The following exam- 
ples illustrate the various constructions most commonly en- 



countered 



1. Single node replacements rule field unchanged: 

T ransf ormat i on : 

NT , a = > ALT, a 
Temp late: 

ALT, a 

2. Single node replacement, operator and rulename modified: 
Transformation: 

ALT , a => NT , r 
Tempi ate: 

NT, r 

3. Replacement with sibling string: 

T ransf ormat i on : 

IOPT , i => COPT , r 2 NT , r 1 I0PT,i 

Temol ate: 

COPT , r2 ; NT , r 1 ? I0PT,i 

4. Replacement with subtree: 

T rans format i on : 

NT , c => NT , r 1 COPT, r2 N T , r 3 
T emp late: 

HEAD, c ( NT , r 1 ? C0PT,r2 ; NT , r 3 ) 
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APPENDIX D 



intermediate-level language definition GRAMMAR 



ILD: 


1 angnane rulelist (extensions!. 


rul el i st : 


+ rule. 


rule: 


{ c -ru 1 e 
1 a-rule 
! i -ru 1 e 
! 1 -ru 1 e > . 


c -ru 1 e : 


{ c-rule-a 
1 c-rul e-b > . 


c-ru 1 e-a : 


c-rulename cdef-a 

" =>" ctla " = >" csla. 


cdef-a : 


+ defpart. 


def Dart : 


< rulename } option i terminal > 


option: 


" l" rulename "1 ". 


c 1 1 a : 


headop "(" freelist 


headop : 


{ head { pdf > . 


head : 


"HEAD". 


pdf : 


{ (predefined functions) >. 


f ree list: 


# freenode ... . 


f reenode : 


freeop rulename. 


f reeop : 


{ nt 1 copt > . 


nt : 


"NT". 


copt : 


"COPT". 


csla: 


+ d i spart . 



di spart : 


{ subtree ! literal J format > . 


suot ree : 


integer [oodisfldl. 


opd i s f 1 d : 


{ optodf 1 pdfodf ! undodf >. 


op t od f » 


" = (" rulename "] " " " . 


pdf odf : 


rulename 


undodf : 


rulename 


c-ru l e-b : 


c-rulename cdef-b 

"=> M ctlb "=>" cslb. 


cdef-b : 


termi nal . 


ct lb: 


"HEAD/" c-rulename. 


c s 1 b : 


♦ t ermoart . 


t ermpart : 


{ literal ! format > . 


a-ru 1 e : 


a-rul ename adef 

"=>" at 1 "=>" at 2 "=>" as 1 


adef : 


"<" al t 1 i st 


a 1 1 1 i st : 


* alt "!" ... . 


alt: 


altchar ":" rulename. 


at 1 : 


"ALT/" a-rulename. 


at2: 


"<" alt-temp ">". 


a 1 t -temp : 


n alt-t " ! " ... . 


al t-t : 


altchar ": NT/" rulename. 


as 1 : 


a-rulename ">". 


as2 : 


a l t -di sp ">". 


al t-di sp: 


* alt-d " ! " ... . 


al t-d: 


altchar rulename. 


i -ru 1 e : 


i-rulename idef 



" = >" i t 1 



'• = >« it2 



isl "=>" >s2 



tl . > H 



i de f : 
itl: 
i 1 2 : 
i s 1 : 
i s2: 

1 -ru 1 e : 



1 -ru 1 e-a : 

1 def-a : 

1 t2a: 

1 s2a : 



" t " ru 1 ename t . 

"ITER ( NT," ru l ename 1 "? IOPT," i-rulename ")". 
"NT," rul enamel "? IOPT," i-rulename. 

"$ 1 ". 

" t " i-rulename " ] " . 

{ 1 -r u 1 e-a 
! 1 -ru 1 e-b 

' 1 -r u 1 e-c > . 

1 -rul ename ":" ldef-a 

"=>" Itl " = >" 1 1 2 a "=>" lsl "=>'• ls 2 a " = >" ls 3 . 
rul enamel rul ename 2 
"NT," rulename 2 "; NT," rul enamel 
LOPT," I-rulename. 

"$ 1 $ 2 ". 



1 -ru 1 e-b : 

1 de f -b : 

1 t2b: 

1 s2b : 

1 -ru 1 e-c I 

ldef-c: 

1 t2c : 

1 s2c : 



1 -rul ename ldef-b 

" = >" itl "=>" 1 1 2b "=>" lsl "=>" 1 s2b ,, => ,, 1 s 3 . 
"#" ru 1 ename 1 "l" ru1enam e 2 "J " 

"COPT," rulename2 "? NT," rul enamel 
"? LOPT," 1 -rul ename. 

"$l=t" rul ename2 "]$2". 

I-rulename ":" ldef-c 

"=>" itl "=>" 1 t2c "=>" lsl " = >" 1 s2c "=>" 1 s3 . 
rul enamel terminal 

"NT," rulenamel "; LOPT," 1-rulename. 
terminal "$1". 
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) 1 1 : 


"LIST ( NT," ru 1 ename 1 10PT," 


l s 1 : 


"SI". 


1 s 3 : 


"1" 1 -ru 1 ename "l". 


format : 


{ newline ! tab ! untab >. 


new line: 


"NL" . 


t ab : 


"T8". 


unt ab : 


"UT". 


extens i ons : 


userpdr userpdf. 


use rprir : 


(undefined) . 


userpdf : 


(undefined) . 



1 - ru 1 ename " ) " . 
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APPENDIX E 



ILD GRAMMAR LANGUAGE DEFINITION 



ILD: 

= > 



=> 

ru 1 e 1 i st 
= > 

= > 

= > 

= > 



langname rulelist [extensions] 

ILD, ILD 

(NT, St r i ngj 
NT, rule] ist; 

COPT , extensions) 

$ 1 = w < 1 angname> " $2 $3 = " tex t ens i ons) " . 

: ♦ rule 

ITER, ru 1 e 1 ist 
(NT, rule? 

IOPT , ru 1 e 1 ist) 

NT , rul e ? 

IOPT , rul e 1 ist 
SI 

"(rulelistJ" . 
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ru 1 e 



{ ctu) e 



! a-ru 1 e 
! i -ru 1 e 
! 1-rule > 

=> ALT^rule 
=> { c tNT, c-ru 1 e 

! a : NT , a-ru 1 e 
J i : NT , \ -r u 1 e 
! 1 : NT, 1 -rul e > 

=> "{rule)" 

= > "{ c :c-rul e I a:a-rule ! i : i - r u 1 e I 1 : 1 - r u 1 e ) " . 



c-rul e: 



= > 



= > 



> 

> 



{ c-rule-a 
' c-rule-o > 

AL T r c - ru 1 e 
{ a : c -ru 1 e-a 
J b:c-rule-h } 

Mc-rule}" 

"< a:c-rule-a ! b:c-rule-b > H . 
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c -ru 1 e-a 



c-rul ename 



n • h 



cdef-a 



" = >" c 1 1 a " = >" csla 
=> HEAD , c-ru 1 e-a 
(NT , St r i ng? 

NT , cdef-a * 

NT , c t 1 a 7 
NT/csl a) 

= > Sl="<c-ru1ename>" 52 "=>'* $3 " = 



cdef-a; 


♦ defpart 


z> 


ITER/ cdef-a 
(NT , defpart / 
IOPT /Cdef-a) 


r> 


NT , de f part / 
IOPT/ cdef-a 


= > 


SI 


r> 


"(defpart)" . 


def oar t 


: ( rulename ! option 


:> 


ALT / defpart 


:> 


{ r:NT/String 
! o : NT / opt i on 
! 1 1 NT , terminal ) 


= > 


"(defpart)" 



= > 



terminal > 



H $u 



" ( r : ru 1 ename S 



o: option ! t: terminal >" 



option: 
= > 

= > 

c 1 1 a : 

= > 

i > 

headop : 
=> 

= > 

= > 

= > 

head: 

=> 

= > 

pdf: 

=> 

= > 

= > 
=> 



" l" rulename "1 H 
HEAD , opt i on 
(NT/ St r i ng) 

" ( M $ 1 =" < ru 1 ename> " "1 " . 

headop "(" freel ist ")" 
HEAD/Ct la 
( NT / headop? 

NT / f reel ist) 

SI "(" $2 ")" . 

f head ! pdf > 

ALT / headop 
< h : NT/ head 
! p:NT /Pdf > 

"{ headop > " 

"< h : HEAD ! p:pdf >" . 

"HEAD" 

HEAD / head 
"HEAD" . 

( (predefined functions) > 
AL T / pdf 
(> 

"(pdf)" 

"<>" . 
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freelist: U f reenode ... 

=> LIST , f reel i st 
( NT , f reenode ; 

LOPT /freelist) 

=> NT/freenode; 

LOPT , f reel i st 
= > $1 
= > SI 

=> "If reenode) " . 

freenode: freeop "/" rul ename 

=> HEAO / f reenode 
(NT / f reeop ? 

NT , St r i ng) 

= > $1 ";" 12="<rulename>" . 



freeoo : 
=> 
=> 



r > 



z> 



{ nt ! copt ) 

AL T / f reeop 
{ n : NT / n t 
! c:NT/copt ) 
"{freeop) " 

"{ n : N T ! ciCOPT }" . 



nt: "NT" 

=> HEAD / nt 
=> "NT" . 
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copt : 


"COPT- 


=> 


HEAD , coot 


= > 


"COPT" . 


csla: 


•f dispart 


= > 


ITER/Csla 
( NT , d i spart ? 
I0PT,csla) 


= > 


NT , d i spart ? 
IOPT ,csl a 


r > 


SI 


= > 


" Cd i soart 1 " . 


di spart : 


: { subtree 5 literal ! format 1 


= > 


ALT* di soart 


= > 


{ s : NT , subt ree 
J 1 :NT, 1 i teral 
! f :NT, format > 


z> 


"{dispart)" 


:> 


"{ sisubtree ! l:literal ! fjformat >" 



subtree: integer CopdisfldJ 



S> 


HEAD t subt ree 
(NT , Integer? 
COPT , opd i s f 1 d) 


s> 


$l="<integer>" S2=" food i s f 1 d] " . 
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opdisfld: { optodf J pdfoaf ! undodf > 



=> 
r > 



= > 
= > 



ootodf : 
: > 



= > 



pdf odf : 
= > 



= > 



undodf : 
: > 



= > 



ALT , opdi s f 1 d 
{ o : NT , opt odf 
J p:NT/pdfodf 
J u:NT, undodf > 

" (oodi sf 1 d> " 

" < oioptodf I pipdfodf ! u:undodf > " . 

" = t" rulename "J """ 

HEAD , optodf 
(NT, String) 

$ 1 = " <ru ) ename>" "J""" . 

« = puienane ">""" 

HEAD, pdf odf 
(NT , St r i ng) 

!l:"<pulsname>" ">""" # 

"=""(" pu I ename ")""" 

HEAD, undodf 
(NT , St r i ng) 

$l-"<ru J ename>" . 
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c -ru 1 e 



= > 



z > 



cde f-b : 
=> 



: > 



c t lb : 

z> 



z > 



cslb: 

=> 



s> 



= > 
= > 



b: c-rulename cdef-b 

" = >" ctlb ” => M cslb 
HEAD, c-rul e-b 
(NT, String; 

NT , cdef-b; 

NT, ctlb; 

NT ,cs lb) 

Sl="<c-ru1 ename>" " : " $2 " = >" S3 "= 

terminal 
HE AO , cde f -b 
(NT, terminal ) 

$1 . 



"HEAD," c-rulename 
HE AD , c t 1 b 
(NT, String) 

"HEAD," $ 1 = " <c - r u 1 ename> " . 

♦ termpart 
ITER, cs Id 
(NT, termpart ; 

IOPT,cslb) 

NT , termpart > 

I OPT, cslb 
$1 

" (termpart) " . 



>" $4 . 
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termpart: ( literal ! format ) 

-> AL T , t e rmpa r t 
=> { 1 :NT, 1 i teral 

} f :NT, format > 

=> " { t e rmpa r t } H 

=> "< llliteral ! flformat > H . 



a-rule: a-rulename " : H adef 

"=>" at 1 "=>" at2 “ = > H as 1 " 
= > HEAO , a*ru 1 e 
(NT , St r i ng; 

NT , ade f ; 

NT , at 1 ? 

NT,at2? 

NT , as 1 ; 

NT , as2 ) 

= > $ l =" <a-ru 1 ename> " M : " $2 



"=>" $3 "=> H sa "=>" $5 "=>" 



$6 . 



adef: " ( H altlist "> " 

s> HEAD, adef 

( NT , a 1 t 1 i st ) 

=> “{" altlist *)* . 



" as2 
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a 1 t I 



alt: 



at 1 : 



at2: 



i st : # alt ... 

L 1ST / a 1 1 1 i s t 
(NT,a1 t ; 

LOPT/altlist) 

=> NT/ alt; 

LOPT » a 1 1 1 i st 
= > $1 
=> $1 

=> * Caltl 1st] " . 

altchar rulename 

=> HEAD/alt 

(NT /Character; 

NT/ St r i ng) 

•> $l="<a1 tchar>" H : " S2=" <ru 1 ename> " . 

"ALT/" a-rulename 
: > HEAD / at 1 

(NT/String) 

> "ALT," $ 1 = " <a-ru I ename>" . 

alt-temp ">" 

> HEAO/ at2 

(NT /al t-temp) 

> $1 "I" . 
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alt-temp: # alt-t "I" ... 

= > LIST, alt-t emp 
(NT , a 1 t -t ; 

LOPT , a 1 t -t emp ) 

= > NT, alt-t; 

LOPT , a 1 t -t emp 
= > $1 
= > $1 

=> "talt-tl" . 

alt-t: altchar ": NT , M rulename 

=> HEAD , a l t -t 

(NT, Character; 

NT , St r i ng) 

= > $ 1 = " <a l t c h a r> " NT," $2 = " < ru 1 ename> " . 

as 1 : a-rulename 

=> HEAD,asl 

( NT , St r i ng) 

= > "<" $l="<a-rulename>" ">" . 

as2 : alt-disp ">" 

=> HEAD,as2 

(NT , al t-di sp) 

=> "(" $1 ">" . 
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alt-aisp: # alt-d "{" .*• 

=> LIST, al tdi sp 
( NT, alt-d ? 

LOPT , al t-di sp) 

= > NT , a I t -d; 

LOPT , a 1 1 -d i sp 
= > SI 
=> $1 

=> " tal t-dispJ " . 

alt-d: altchar rulename 

=> HEAD, alt-d 

CNT , Character; 

NT , St r i ng) 

= > $l = "<al tchar>" " : " S2="<ru I ename>" . 

i-rule: i -rulename " : " idef 

"=>" itl "=>" it2 " = >" isl " = >" i s2 
-> HEAD, i -rul e 
(NT , S t r i ng; 

NT , i de f ; 

NT, i t 1 ; 

NT, it 2; 

NT , i s 1 ; 

NT, i s2 ) 

s> $ 1 =" < i -ru 1 ename> " $2 

"=>" $3 "=>" $4 " = >" S5 "=>" $6 . 
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i def : 


" + " ru 1 ename 1 


= > 


HEAD , i de f 
(NT, String) 


r > 


5 1 = " < ru 1 ename 1 > " . 


i 1 1 : 


"ITER ( NT," ru 1 ename 1 "? IOPT," i-rulename ")" 


= > 


HEAD, i t 1 
(NT , St r i ng > 
NT, String) 


=> 


"ITER ( NT," $l="<rulenamel>" " ; IOPT," 
S2="<i -rul ename>" . 


i 1 2: 


"NT," rulenamel "; IOPT," i-rulename 


= > 


HEAD, i 1 2 
(NT, String? 
NT, String) 


= > 


"NT," $l="<rul enamel>" ",* IOPT," $2 = " < i -ru 1 ename> " 


i s 1 : 


"SI " 


= > 


HEAD, i si 


= > 


"SI" . 


i s2 : 


" C" i-rulename "1 " 


= > 


HEAD, i S 2 
( NT , St r i ng ) 


= > 


"(" $l="<i -rul ename>" ") " . 
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I -ru 1 e 



{ 1 -ru 1 e-a 



! 1-rule-b 
! 1-rule-c > 

= > AL T , 1 -ru 1 e 
= > { a :NT, 1 -rul e-a 

! b : NT , 1 -ru 1 e-b 
! c : NT , 1 -ru 1 e-c > 

=> " { 1 -ru 1 e> " 

=> "{ a:1-ru1e-a I b:l-rule-b ! c:l-rule-c >" . 

1-rule-a: l-rulename Idef-a 

" = >" itl "=>" 1 t 2a "=>" 1 s 1 "=>" I s2a "=>" ls3 
= > HEAD , 1 -ru 1 e-a 
(NT, St r i ng; 

NT , ldef-a* 

NT, 1 1 1 ; 

NT, 1 t2a,‘ 

N T , 1 3 1 i 
NT, 1 s2a,* 

NT, 1 s3) 

= > $1="<1 -rul ename>" $2 

" = > " $3 M = > " " = >" $5 " = >" So "=>" $7 . 
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Idef-a: "3" rul enamel ru1ename2 



= > 


HEAD/ 1 def -a 
(NT/String; 
NT / S t r i ng ) 


= > 


"3" i 1 = " < ru 1 ename 1 > " $2 = " < ru 1 enamel " . 


1 1 <2a : 


"NT/" rulename2 "; NT/" rulenamel 
"; LOPT/" 1 -ru 1 ename 


= > 


HEAD / 1 1 2a 
( NT / S t r i ng ; 
NT/ St r i ng? 
NT / St r i ng) 


= > 


"NT/" $l="<rulename2>" "? NT/" $2="<ru1 ename 1>" 
"/• LOPT/" S3="<1 -rul ename>" . 


1 s2a : 


" $ 1 $2 " 


= > 


HEAD/ 1 s2a 


= > 


" 5 1 $2" . 
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1 -ru I e*b 



1 *ru 1 ename 



H * ft 



ldef-b 





"=>" 1 t 1 " = >" 1 1 2b "=>" lsl "=>" 1 s2b " = >" 1 s3 


z > 


HEAD, 1 -rul e-b 
( NT , St r i ng » 
NT , 1 de f -b ? 
NT, 1 t i ; 

NT, 1 t2b? 

NT, 1 si ? 

NT, 1 s2b? 

NT, 1 s3) 


z> 


$ 1 = " < 1 - r u 1 ename> " "5" $2 

"=>" S3 "=>" $4 " = >" $5 " = >" So "=>" $7 . 


ldef-b: 


"#" rul enamel " l" rulename2 "} " " . . . " 


=> 


HEAD, 1 def-b 
(NT , St r i ng? 
COPT /String) 


z > 


"#" $l="<rul enamel>" " (" $2="<rul enamel" "J" " . . . " 


1 1 2b: 


"COPT," rul ename2 "? NT," rul enamel 
"? LOPT," l*rulename 


z> 


HEAD, 1 t2b 
(COPT, String? 
NT , St r i ng? 

NT , St r i ng) 


z > 


"COPT," $l="<rul ename2>" "? NT," $2= " <ru 1 ename 1 > " 
"? LOPT," $3=" < 1 -rulename>" . 
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I s2b: 
=> 



= > 



1 -ru 1 e- 



= > 



= > 



ldef-c: 

s> 



" $ 1 = t " ru 1 enamel "1 32" 

HEAD, 1 s2b 
(NT, String) 

"$l=l" Sl="<ru1ename2>" "1 $2" . 

c: 1-rulename " : " ldef-c 

" = >" itl " = >’* 1 1 2c " = >" 1 s 1 - = >'• 1 s2c " = >" Is 3 
HEAO , 1 -ru 1 e-c 
(NT , St r i ng ; 

NT, ldef-c; 

NT, 1 1 1 ; 

NT, 1 t 2 C ; 

NT, 1 si ; 

NT , 1 s2c ; 

NT, 1 s3) 

S 1 = " < 1 -rulename>" " : " $2 

" = >" S3 "=>" $a *=>" $5 "=>" $o " = >" $7 . 

rulenamel terminal 
HEAD, ldef-c 
(NT, String; 

NT , t erm i na 1 ) 

Sl = "<rulenamel>" J2 " . . . " . 
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1 t2c : 


"NT," rulenamel " ? LOPT," 1-rulename 


• > 


HE AD , 1 t 2c 
(NT, String? 
NT, String) 


= > 


"NT," $l="<rul enamel >" "LOPT," $2="< 1 -ru 1 ename>" . 


1 s2c : 


terminal "$1" 


= > 


HE AD , 1 s2c 
(NT , termi nal ) 


= > 


$l="<termi nal >" "SI" . 


1 1 1 : 


"LIST ( NT," rulenamel "? LOPT," 1-rulename ")" 


=> 


HEAD, 1 t 1 
(NT, St ri ng? 
NT , St r i ng) 


=> 


"LIST ( NT," $l="<rulenamel>" "? LOPT," 
$2="<1-ru1ename>" ")" . 


) si : 


"SI- 


=> 


HEAD, 1 si 


z> 


"SI" . 


1 s 3 : 


"(" 1-rulename "]" 


= > 


HEAD, 1 s3 
( 1 -ru 1 ename) 


=> 


"C" Sl="<1-ru1ename>" "1" . 



16a 



terminal 



1 i teral 



= > 


Head / 1 e rm i na 1 
(NT/ String) 


= > 


"""" Sl="<terminal >" . 


literal: 


: literal 


= > 


Head / 1 i t er a 1 
(NT/ St r i ng) 


= > 


"""" $1="<1 i teral >" """" . 


format : 


( newline ! tab J untab > 


= > 


ALT / format 


= > 


{ n : NT / new H ne 
! t : NT / 1 ab 
! u : NT / untab > 


2> 


" ( format > " 


= > 


"( ninewline J t:tab ! utuntab >" 


new line; 


: "NL" 


= > 


HEAD / new 1 i ne 


:> 


"NL" . 


tab: 


"TB" 


-> 


HEAD/tab 


:> 


"TB" . 
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un t ao 



"UT" 



= > HEAD, untab 
=> "UT" . 

extensions: userpdr userpdf 
=> HEAD , ex t ens i ons 
(NT , userpdr ; 

NT , userodf ) 

=> SI $2 . 

userpdr: (undefined) . 

userpdf: (undefined) . 



Strino, Integer, and Character are system predefined rules. 
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APPENDIX F: MEMORANDUM LANGUAGE DEFINITION 



The following Language Definition, constructed by hand, 
illustrates the templates and schemas required for the de- 
finition of a simole grammar. When realized as an AST via 
the ILD Grammar Directed Editor and interpreted by the sys- 
tem predefined function ILD, this Language Definition could 
be installed in the Language Definition Module as oart of a 
Memorandum GDE. 



memo : ( sa l ut at i on] body (closing] 

= > ILD , memo 

(COPT, salutation; 

NT , body t 
COP T , c 1 os i ng) 

= >NL Sis" (salutation] " $2 NL TB TB T6 = " Ic 1 os i ng] " . 

sa 1 ut at i on ; "Dear M name 
=>HEAD, salutation 
(NT , String) 

=>"Dear" $l="<name>" . 



body; + paragraph 
= > I TER , body 

(NT , paragraph; 
IOP T , body ) 
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=>NT , oaragraph ? 

I OPT , body 
=>NL TB UT $1 
= >NL " Cparaqraph] " . 

paragraph : ♦ 1 i nes 

=>ITER/paragraph 
(NT ,St r i ng; 

IOP T » par agraph ) 

= >NT /String? 

IOPT / paragraph 
=>S1 ="< 1 ine>" NL 
= >" (1 inel " NL . 

c 1 os i ng : "S i ncere 1 y / " name 
=>HEAD/C 1 osi na 
(NT/St r i nq) 

=>"Si ncerel y / " NL Sl="<name>" . 



Strinq is a system predefined rule. 
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'pPE M 0 T A G: 5YSTt M PRFpFF T ,'lED F 1 JimT I T UUS 



The fol lowi nq is a list o * oroqr'dfT'U'i no I a n o u a g e primi- 
tive ooerati ons , derived in o a r c f ror fPratt>l Q 7SJ, w n i c h 
coul a c» imp|em«nfed as System D reaef inpd Functions. f K is 
list is not i n t e n a e a as a comprehensive collection oi n e 
primitives desi redf or even r ecu 1 red ^ for i^c'emertition of 
a GOE system. father, these functions are presented here as 
an indication of the classes of operations which mmnt be 
made available in suooort of users of the G D E . 

Synthesis Operators 

1 . NT 

2. COPT 

3. iOPT 

4. LOPT 

5. ALT 

6. TPRN' 

7. HEAD 
d. ITEP 
9. L T ST 

Arithmetic Operators 

10. PLUS 

11. MINUS 

12. MH|_ multiplication 



1 o Q 



13. 


DIV 


division 




la. 


REM 


remainder 




15. 


UPLUS 


unary pi us 




16. 


UMINUS 


unary minus 




Relational Operators 




17. 


EQUAL 


equa 1 i t y 




18. 


NTEQ 


not eaual 




19. 


GT 


greater than 




20. 


LT 


less than 




21. 


GTE 


greater than 


or equal 


22. 


LTE 


less than or 


equa 1 


Bool ean 


Operators 






23. 


AND 






24. 


OR 






25. 


NOT 






Assignment Operators 




26. 


ASNA 


arithmetic assignment 


27. 


ASNS 


string assignment 


Sequence 


■ Control 


Operators 




28. 


CONO 


i f-t hen-e 1 se 


condi t i ona 1 


29. 


LOOP 


general i zed 


l OOP 


30. 


CASE 
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Symbol Table and Data Element Operators 



32. 


DECLARE 


dec 1 arat i on 


33. 


BLOCK 




34. 


IDENT 


i den t i f i e r 


35. 


NUMBER 




36. 


STRING 




System 


Operators 




37. 


ILD 


AST to Language 


Mi see 1 1 


1 aneous 




38. 


NOP 


nu 1 1 operat i on 



Definition 



translation 
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APPENDIX H. FIGURES 



< root > 



f * I • 

program <id> ( <names> ) ? 



<b 1 oc k> 



I v- r 

t ree <i d> 1 (names ) 

| ■ , J-— T 

input r <id> I (names) 

I 

output 



1 I I I i I 

od ) o(t) o(v) o(sr) begin <statements> end 



r 



<vars> <statement> 1 ( s t at ement s ) 



i 

va r 



<v-decls> ; o(integer) 



i i 

<names> : 





<type> 



o (ac t i on ) 

! 

<ass i gnment > 



i ^ i 

< i q> 1 (names ) < i d> 



r 



i —] t 

<variab1e> := <expr> 
I 



i 



a inteqer 

Note! non-terminal names 
have been abbreviated. 



program tree ( i nput t output ) / 
var : a» 
begin 
a : = 1 
end . 



<id> o(modifiers) <s-exor> 
a 



o(siqn) <u-expr> 

I 

<t erm> 

. I 

<f ac tor> 

I 

<v-or-c> 

I 

<u-cons t ant > 

I 

I 



Figure 1. Parse tree for a trivial program. 
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CONCATENATION: 



x 1 x 2 ... xn f xx 


= { 


rk : 


"l N rk"l" ! tic > 


<r k> 


i f 


x k = 


r k 


<c> = > 








copt (rk) 


i f 


x k = 


" l " r k " I H 



copt(p) => <r> 



ALTERNATION: 

a : r 1 "!" r2 M !" ... H !" rn 

<a> = > { <p 1> ! <r2> ! ... J <rn> > 

ITERATION: 
i : "+" r 



< i > => <r> i opt ( i ) 

iopt(i) => <p> iopt(i) 

LIST: 

1 : r 1 x , x = { r2 ! " t H r2 " J " ! t > 

<1> => <rl> lopt(l) 

<r2> <rl> lopt(l) if x : r2 

lopt(l) => copt(r2) <rl> lopt(l) if x = "C H r2"]" 
<rl> 1 opt (1 ) if x : t 

PREDEFINED: 
p : pd f 



<p> => pdf(p) 

UNDEFINED: 

<u> => <g> 
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concatenation rules > 
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i terat ion rules > 
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list rules > 
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predefined rules > 


u 
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U 
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< 


undefined rules } 
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i n 


R 


s 


{ 


C,A,I,L,P,U > 
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i n 


T 


r 


< 


terminal symbols > 



Figure 2, Transformations 
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CONCATENATION: 



x 1 x 2 . 


• • x n 


ii 

X 


{ rk ! 


" ["rk") " ; t k > 






NT, r k 


if x k 


s r < 


NT,c 


= > 


COPT, rk 


if x k 


= " [ " r k " J " 


COPT , r 


= > 


NT , r 







alternation: 

a r 1 " ! " r2 "I" ... " ! " rn 



NT , a 


= > 


{ NT , r 1 ! NT, r2 J ... ! NT 


, rn > 


I teration: 

i : " ♦ " r 








NT, i 


r > 


NT, r I OP T , i 




IOPT, i 


= > 


NT , r IOPT , i 




LIST: 

1 : r 1 x 


n 

• • 


." , x = i r2 J " [ " r 2 " J 


" : t > 


NT, 1 


Z> 


NT , r 1 LOP T , 1 




LOPT, 1 


= > 


NT , r2 N T , r l LOPT,) 
COPT , r2 NT , r 1 LOPT,l 
N T , r 1 LOPT , 1 


i f x = r2 

if x = " [ M r2"J " 

if x = t 


PREDEFINED: 
p : pd f 








NT ,p 


= > 


PDF(p),p 




UNDEFINED: 








NT , u 


2> 


NT , u 





c 


i n 


c 


S 


i 


concatenation rules > 


a 


i n 


A 


- 


< 


alternation rules > 


i 


i n 


I 


- 


< 


iteration rules > 


1 


i n 


L 


• 


< 


list rules ) 


P 


i n 


P 


r 


{ 


predefined rules > 


u 


i n 


U 


s 


{ 


undefined rules ) 


r 


i n 


R 


- 


i 


C,A,I,L,P,U } 


t 


i n 


T 




i 


terminal symbo 1 s ) 



Figure 3. Labelled Transformations 
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CONCATENATION: 



c : x l x 2 . 


.. xn , xk = i rk ! " l"rk"] - 


1 

1 


tk } 






NT , rk if xk s rk 






NT,c 


= > 












COPT, rk i f xk = " C"r k" 


1 " 




COPT , r 


= > 


N T , r 






alternation 


• 

• 








a : "<■ rl 


« 1 H 
1 


r2 "S" ... " ! " rn " > - 






NT, a 


~ > 


ALT, a 






AL T , a 


= > 


{ NT , r 1 ! NT , r2 ! ... ! NT 


, rn 


> 


ITERATION: 










i : "+" r 










NT, i 


r > 


NT , r I OP T , i 






I OPT , i 


= > 


NT , r I OP T , i 






LIST: 










1 : rl 


Y * 

X • 


, x = < r2 ! H t"r2"J 


W 1 

1 


t > 


NT, 1 


r > 


N T , r l LOPT , 1 










NT , r2 NT , r 1 LQPT,1 


i f 


x ~ 


LOPT, 1 


= > 


COPT , r 2 NT , r l L0PT,l 


i f 


X = 






N T , r 1 LOPT , 1 


i f 


x - 


PREDEFINED: 










p : pd f 










NT , p 


r > 


TERM, p 






TERM, p 


5> 


PDF(p) ,p 






UNDEFINED: 










NT , u 


z> 


NT , u 






c 


i n C 


= { concatenation rules > 






a 


i n A 


= { alternation rules > 






i 


i n L 


= { i terat ion rules > 






1 


i n L 


= < list rules > 






P 


in P 


s { predefined rules > 






u 


in U 


= { undefined rules > 






r 


i n R 


— { C,A,I,L,P,U I 






t 


i n T 


- { terminal symbols > 







Figure 4, Extended T rans f ormat i ons 
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Figure 5. System Architecture (Data Flow) 
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